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1.  INTRODUCTION 

Wc  restrict  our  attention  to  finite,  undirected  graphs.  Multiple  edges 
may  be  present,  but  loops  are  ignored.  A  pair  of  adjacent  edges  uu  and  uw , 
with  u  ¥=  u  #  w,  is  lifted  by  deleting  the  edges  uu  and  uw,  and  by  adding 
the  edge  uw.  A  graph  H  is  immersed  in  a  graph  G  if  and  only  if  a  graph 
isomorphic  to  H  can  be  obtained  from  G  by  taking  a  subgraph  and  by 
lifting  pairs  of  edges. 

The  immersion  order  can  be  applied  to  a  number  of  combinatorial 
problems.  Consider,  for  example,  the  problem  of  deciding  whether  a  graph 
satisfies  a  given  width  metric.  The  cutwidth  of  G  =  (K,  E)  is  the  minimum, 
over  all  linear  layouts  of  V ,  of  the  maximum,  over  all  pairs  u  and  u  of 
consecutive  vertices,  of  the  number  of  edges  from  E  that  must  be  cut  to 
split  the  layout  between  u  and  u.  Although  ./^-complete  in  general, 
cutwidth  can,  in  principle,  be  decided  in  linear  time  for  any  fixed  width 
using  a  finite  but  unknown  list  of  immersion  tests.  Multidimensional 
generalizations  of  cutwidth,  termed  congestion  problems,  can  likewise  be 
solved  in  linear  time  if  only  one  has  the  right  collection  of  immersion  tests 
available.  These  and  other  problems  amenable  to  the  immersion  order 
arise  during  circuit  fabrication,  parallel  computation,  network  design  and 
many  other  processes. 

The  graphs  required  for  the  aforementioned  tests  are  called  obstruc¬ 
tions.  So,  for  example,  when  one  knows  all  obstructions  to  cutwidth  k ,  one 
knows  a  characterization  for  the  family  of  graphs  that  have  cutwidth  k  or 
less.  Given  the  right  collection  of  obstructions,  linear-time  decidability  is 
assured  by  bounding  an  input  graph’s  treewidth  [9],  computing  its  tree 
decomposition  [2],  and  applying  dynamic  programming  to  test  each  ob¬ 
struction  against  the  decomposition  [19].  We  refer  the  reader  to  [8]  for 
detailed  information  on  this  subject. 


346 


BOOTH  ET  AL. 


Unfortunately,  little  is  known  about  immersion  obstructions  in  general  ' 
or  about  practical  immersion  tests  in  particular.  Complete  graphs  are  often 
obstructions.  Testing  for  /£,  and  K2  are  trivial.  Detecting  a  K3  is  easy:  K3 
is  immersed  in  any  graph  of  order  3  or  more  unless  the  graph  is  a  tree  with 
no  pair  of  multiple  edges  incident  on  a  common  vertex. 

The  first  really  difficult  test,  and  the  one  we  devise  here,  is  for  I<4. 
Observe  that  K4  is  an  obstruction  for  cutwidth  3,  because  any  arrange¬ 
ment  of  its  vertices  on  a  line  requires  a  cut  of  four  edges.  Ours  is  the  first 
practical  linear-time  algorithm  known  for  this  task. 

If  a  graph  contains  a  topological  K4,  then  it  also  contains  an  immersed 
K4 .  Thus  we  consider  only  those  graphs  with  no  topological  K4.  These  are 
exactly  the  series-parallel  graphs  [4].  But  K4  can  be  immersed  in  a 
series-parallel  graph.  As  a  simple  example,  consider  the  star  graph  with 
three  rays,  each  ray  with  three  edges,  as  shown  in  Fig.  1.  Clearly,  multiple 
edges  are  critical,  making  immersion  tests  potentially  more  complicated 
than  tests  in  the  more  familiar  minor  and  topological  orders  (see,  for 
example,  [15]). 

In  the  next  section  we  state  relevant  definitions  and  we  derive  a  few 
useful  technical  lemmas.  In  Sections  3  and  4,  we  present  algorithms  for  K4 
immersion  testing  and  K4  model  finding,  respectively.  Although  many  of 
the  explanatory  details  are  tedious,  especially  the  correctness  proofs,  the 
algorithms  themselves  are  straightforward  to  implement.  In  a  final  section 
we  discuss  efficiency,  applications  and  parallelization. 


2.  PRELIMINARIES 

We  concentrate  on  edge-disjoint  paths,  which  arc  relevant  due  to  the 
following  alternate  characterization  of  immersion  containment:  H  = 
(V/ifEn)  is  immersed  in  G  =  (Kg,£g)  if  and  only  if  there  exists  an 
injection  from  VH  to  VG  for  which  the  images  of  adjacent  elements  of  VH 
are  connected  in  G  by  edge-disjoint  paths.  Under  such  an  injection,  an 
image  vertex  is  called  a  comer  of  H  in  G;  all  image  vertices  and  their 
associated  paths  are  collectively  called  a  model  of  H  in  G.  Our  algorithms 
exploit  t[ie  edge  connectivity  of  the  input  graph. 


FIG.  1.  A  series-parallel  graph  with  an  immersed  KA . 
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2.1 .  Three-Edge  Connectivity 

A  cut  point  in  a  connected  graph  G  is  a  vertex  whose  removal  discon¬ 
nects  G.  Two  vertices  in  G  are  said  to  be  biconnected  if  they  cannot  be 
disconnected  by  the  removal  of  any  cut  point.  A  biconnected  component  of 
G  is  the  subgraph  induced  by  a  maximal  set  of  pairwise  biconpected 
vertices. 

A  cut  edge  in  G  is  an  edge  whose  removal  disconnects  G .  A  pair  of 
edges,  neither  of  which  is  a  cut  edge,  is  said  to  form  a  cut  edge  pair  if 
removing  both  of  them  disconnects  G.  Two  vertices  are  three-edge-con- 
nected  if  there  are  at  least  three  edge-disjoint  paths  between  them.  G  is 
three-edge-connected  if  and  only  if  it  has  no  cut  edges  and  no  cut  edge 
pairs. 

A  three-edge-connected  component  of  G  =  (V,  E)  is  a  graph  G'  =  (V ,  E') 
where  V’  c  V  is  a  maximal  set  of  vertices  that  are  pairwise  three-edge- 
connectcd  in  G.  E  contains  all  edges  induced  by  V  plus  a  (possibly 
empty)  set  of  virtual  edges  defined  as  follows:  for  {u,  /;}  c  V\  a  virtual  edge 
uv  is  added  to  E  for  each  distinct  { x ,  y)  QV  —  V'  such  that  ux  and  vy 
form  a  cut  edge  pair  in  G.  Note  that,  due  to  the  possible  preference  of 
virtual  edges,  a  three-edge-connected  component  will  not  necessarily  be  a 
subgraph. 

Lemma  1.  If  KA  is  immersed  in  G,  then  KA  is  immersed  in  some 
three-edge-connected  component  of  G . 

Proof  Let  a,  b,  c ,  and  d  denote  the  corners  of  a  KA  model  in  G. 
These  corners  are  joined  (in  G)  by  at  least  six  edge-disjoint  paths:  [ab], 
[ ac ],  [ad],  [be],  [bd],  and  [cd].  Thus  a  and  b  are  connected  by  at  least 
three  edge-disjoint  paths:  [ab],  [ac][cb],  and  [ad][db].  Maximally  ensures 
that  the  three-edge-connected  component  containing  a  also  contains  b 
and,  by  symmetry,  c  and  d.  Let  Ga  denote  this  component.  If  [ab]  contains 
edges  riot  in  Ga,  then  [ab]  can  be  written  as  [au]ux[xy]yv[vb],  where  ux 
and  yv  are  a  cut  edge  pair  in  G  and  uv  is  a  virtual  edge  in  Ga.  Thus  a  and 
b  are  connected  within  Ga  by  [au]uv[vb],  which  is  edge  disjoint  from  the 
other  five  paths  of  the  model.  By  symmetry,  all  pairs  of  corners  are  so 
connected  within  G„.  | 

The  proof  of  Lemma  1  can  be  generalized  to  any  three-edge-connected 
graph  immersed  in  another. 

For  our  purposes,  a  multigraph  is  said  to  be  reduced  if  all  but  four 
qopies  of  any  edge  having  multiplicity  5  or  more  are  removed. 

Lemma  2.  If  KA  is  immersed  in  G,  then  KA  is  immersed  in  the  reduced 
graph  of  G. 
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Proof  Let  a,  c,  and  d  denote  the  comers  of  a  K4  model  in  G,  and 
suppose  five  or  more  copies  of  the  edge  uv  arc  contained  within  its  six 
edge-disjoint  paths.  Without  loss  of  generality,  assume  these  paths  are 
simple.  For  some  pair  of  corners,  say  a  and  b,  all  five  paths  with  an 
endpoint  at  a  or  b  contain  both  u  and  u.  Either  u  is  a  corner,  or  it  can  be 
made  a  corner  by  replacing  a  with  u  (deleting  the  three  subpaths  of  the 
form  [<7u]).  Similarly,  either  v  is  a  corner  or  b  can  be  replaced  with  v.  G 
therefore  contains  a  K4  model  with  corners  u ,  u ,  w ,  and  x,  where 
{w,  x)  c  (a,  b,  c,  d).  At  most  one  of  [uw],[vw]  must  contain  uv ;  at  most 
one  of  [ux\[vx]  must  contain  uv.  Thus,  of  the  six  edge-disjoint  paths  of 
this  model,  at  least  two  need  not  contain  uv,  and  all  but  four  copies  of  uv 
can  be  eliminated.  This  construction  is  iterated  until  a  model  is  obtained 
whose  edges  each  have  at  most  four  copies.  Edges  not  in  this  model  are 
now  removed  until  G  is  reduced.  | 

In  the  sequel,  we  assume  that  all  graphs  are  reduced. 

2.2.  Series-Parallel  Graphs 

Series-parallel  graphs  have  been  widely  studied,  and  are  characterizable 
in  several  ways.  As  mentioned  in  Section  1,  one  such  characterization 
relies  on  the  absence  of  a  topological  K4.  Topological  containment  can  be 
defined  as  a  restricted  form  of  immersion  containment,  with  lifting  permit¬ 
ted  only  at  vertices  of  degree  2.  Alternately,  topological  containment  can 
be  viewed  as  an  injection,  but  with  vertex-disjoint  rather  than  edge-disjoint 
paths. 

Lemma  3.  Each  (hree-edge-connected  component  of  a  series -parallel  graph 
is  series-parallel. 

Proof.  The  proof  is  straightforward,  by  noting  that  virtual  edges  intro¬ 
duce  no  additional  vertex-disjoint  paths.  | 

Another  useful  characterization  is  much  older,  and  based  on  graphs  that 
are  said  to  be  two-terminal  series-parallel  (henceforth  2TSP).  A  2TSP  graph 
is  defined  in  terms  of  base  graphs  and  two  types  of  composition  operators. 
A  base  graph  is  a  copy  of  K2 ,  with  vertices  (terminals)  labeled  “source” 
and  “sink.”  A  series  operator  combines  two  graphs  by  identifying  one’s 
source  with  the  other’s  sink.  A  parallel  operator  combines  two  graphs  by 
identifying  source  with  source  and  sink  with  sink.  Hence  the  characteriza¬ 
tion:  a  graph  is  series-parallel  if  and  only  if  its  biconnected  components 
are  two-terminal  series-parallel. 

This  characterization  is  often  attractive  because  it  prompts  a  natural 
“decomposition  tree”  T  whose  labels  indicate  how  a  2TSP  graph  can  be 
broken  back  down  into  base  graphs  and  operators.  If  a  2TSP  graph  is 
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merely  a  base  graph  e,  T  \ s  a  single  vertex  with  label  e.  Otherwise,  T  is 
formed  from  the  decomposition  trees,  Tx  and  T2 ,  of  the  pair  of  2TSP 
graphs  used  in  the  composition.  The  roots  of  Tx  and  T2  are  joined  to  the 
root  of  T,  which  is  labeled  S  in  the  case  of  a  series  composition  and  P  in 
the  case  of  a  parallel  composition.  Nodes  labeled  S  and  P  are  termed 
5-nodes  and  P-nodes,  respectively. 

We  conclude  this  section  by  noting  from  [3]  that  if  a  simple  graph  /•/  is 
series-parallel,  then  \EH  \  <  2\Vti\  —  3.  From  this  bound  and  Lemma  2,  we 
know  that  all  graphs  of  interest  have  at  most  a  linear  number  of  edges. 


3.  TESTING  FOR  K4 

Let  G  denote  an  arbitrary  input  graph  with  n  vertices  and  m  distinct 
edges.  Without  loss  of  generality,  we  assume  G  has  already  been  reduced 
and  is  input  as  a  simple  graph  with  integer  weights  indicating  edge 
multiplicities. 

Our  method  to  test  for  the  presence  of  an  immersed  K4  proceeds  in 
three  steps.  Algorithm  decompose  is  first  invoked  to  determine  whether 
G  is  series-parallel.  If  G  is  series-parallel,  then  algorithm  components  is 
used  to  break  G  into  three-edge-connected  components.  Finally,  algorithm 
test  is  employed  to  search  each  three-edge-connected  component  sepa¬ 
rately  for  an  immersed  K4. 

3.1.  Algorithm  decompose 

Algorithm  decompose  is  modeled  on  the  method  of  [11].  It  determines 
whether  G  is  series-parallel  and,  if  so,  computes  a  decomposition  tree  for 
each  biconnected  component.  To  accomplish  this,  decompose  makes  use 
of  the  fact  that  for  any  edge  st  in  a  biconnected  graph  B  with  p  vertices, 
the  vertices  of  B  may  be  numbered  from  1  to  p  so  that  vertex  s  receives 
number  1,  vertex  t  receives  number  p,  and  every  vertex  except  s  and  t  is 
adjacent  to  both  a  higher  numbered  vertex  and  a  lower  numbered  vertex 
[14].  Such  a  numbering  is  called  an  s,  t-numbering  for  B. 

algorithm  decompose(G) 
input:  a  multigraph  G 

output:  a  scries  parallel  decomposition  tree  for  each  biconnected  compo¬ 
nent  of  G  if  G  is  series-parallel,  NO  otherwise 

begin 

,  find  the  biconnected  components  Bu . . . ,  Bk  of  G 
for  i  =  1  to  k  do 
begin 

choose  a  pair  of  adjacent  vertices  to  be  the  source  s  and  sink  t  in  Bt 
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find  an  s ,  /-numbering  of  Di 

B(  :=  the  directed  graph  obtained  by  orienting  each  edge  in  B(  from 
the  endpoint  with  the  lower  s ,  /-number  to  the  one  with  the 
_  higher  number 
if  Bj  is  a  directed  2TSP  graph 

then  compute  a  series-parallel  decomposition  tree  7]  for  Bt 
else  output  NO  and  halt 

end 

output  Tl7...,Tk 

end 

The  correctness  of  decompose  is  based  on  the  observation  that  any  pair 
of  adjacent  vertices  may  be  chosen  as  s  and  /  [11].  Efficient  methods  for 
finding  biconnected  components  and  computing  /-numberings  are  known 
[20,  5].  Techniques  for  determining  whether  directed  graphs  are  2TSP  and 
finding  decomposition  trees  can  be  found  in  [21].  All  these  algorithms  are 
linear  in  n  and  m;  thus  decompose  runs  in  O(n)  time. 

3.2.  Algorithm  components 

Algorithm  components  finds  the  three-edge-connected  components  of  a 
series-parallel  multigraph  in  linear  time.  The  input  to  components  is  a 
series-parallel  graph  and  a  series-parallel  decomposition  tree  for  each  of 
its  biconnected  components.  The  output  is  its  set  of  three-edge-connected 
components  (including  virtual  edges). 

We  proceed  by  first  removing  all  cut  edges.  These  are  easily  found 
because  each  cut  edge  is  contained  in  a  biconnected  component  consisting 
only  of  that  edge.  Notice  that  each  cut  edge  pair  must  be  contained  within 
some  biconnected  component.  Thus  it  suffices  to  give  an  algorithm  for 
computing  the  three-edge-connected  components  of  a  biconnected  2TSP 
graph. 

Let  G  be  such  a  2TSP  graph  with  source  s  and  sink  /.  Let  e,  f  be  a  cut 
edge  pair  in  G.  Let  Gl  and  G2  be  the  graphs  left  when  e  and  /  are  deleted 
from  G.  We  call  this  cut  edge  pair  s,  l-nonseparating  if  s  and  /  are  both  in 
Gj  or  both  in  G2.  Otherwise  we  call  the  pair  s,  t -separating.  We  say  an 
s,  /-nonseparating  pair  is  special  if  its  deletion,  followed  by  the  addition  of 
virtual  edges,  results  in  two  graphs  such  that  one  contains  s  and  /  and  the 
other  is  thrcc-cdge-connectcd. 

These  definitions  are  illustrated  in  Fig.  2.  In  this  figure,  edges  ab  and  cd 
arc  a  special  pair  of  graph  G.  Deleting  them  and  adding  virtual  edges  ad 
and  be  gives  Gh  which  contains  both  s  and  /,  and  G2,  which  is  three- 
edge-connected.  Edge  st  and  the  virtual  edge  ad  together  form  an  s ,  /-sep¬ 
arating  pair  in  Gj.  Gn,  G12,  and  G2  are  the  three-edge-connected  compo¬ 
nents  of  G. 
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FIG.  2.  A  two-terminal  series-parallel  graph  with  cut  edge  pairs. 


For  our  purposes,  the  decomposition  tree  T  for  a  2TSP  graph  G  must 
be  ordered .  That  is,  if  x  is  a  tree  node  representing  a  graph  formed  by 
composing  GI  and  G2  in  series  such  that  the  sink  of  G,  is  identified  with 
the  source  of  G2,  then  the  left  child  of  x  must  be  the  root  of  a 
decomposition  tree  for  G,  and  the  right  child  of  a:  must  be  the  root  of  a 
decomposition  tree  for  G2.  Thus  the  order  among  children  of  a  scries 
node  is  fixed.  The  children  of  a  parallel  node  can  be  in  any  order. 
Additionally,  we  assume  that  an  edge  uu  stored  at  a  leaf  of  a  decomposi¬ 
tion  tree  is  represented  by  the  ordered  pair  ( u ,  £;),  where  u  has  a  smaller 
number  than  u  in  the  s ,  ^-numbering  used  in  decompose. 

Our  algorithm  proceeds  in  two  phases.  In  the  first  phase  special  pairs 
are  found  and  deleted  (and  appropriate  virtual  edges  arc  added)  until  no 
more  are  left.  This  leaves  a  collection  of  (isolated  vertices  and)  2TSP 
graphs,  one  of  which  contains  both  j  and  t.  We  will  call  this  graph  Gs ,.  All 
other  graphs  in  the  collection  are  three-edge-connected.  We  can  show  that 
Gs,<  contains  at  most  one  cut  edge  pair.  In  the  second  phase  the  last 
remaining  cut  edge  pair,  if  it  exists,  is  found  and  removed,  and  virtual 
edges  are  added. 

In  order  to  find  any  of  these  cut  edge  pairs  we  use  the  compressed 
decomposition  tree  for  the  graph.  A  compressed  decomposition  tree  is 
formed  from  a  binary  decomposition  tree  merely  by  identifying  all  pairs  of 
adjacent  nodes  that  are  of  the  same  type. 

Let  G  be  a  biconnected  2TSP  graph  with  a  compressed  decomposition 
tree  T.  Special  pairs  can  be  found  by  processing  T  in  a  bottom-up  fashion. 
When  a  special  pair  e,f  is  removed,  virtual  edges  are  added  and  T  is 
modified  to  represent  G'  ,,  the  graph  containing  s  and  t  that  is  left  after 

removing  e  and  /  from  G  (the  other  graph  left  is  a  three-edge-connected 
component). 

Pseudo-code  for  components  is  presented  in  the  following  text.  In  a 
compressed  tree,  each  internal  node  will  have  at  least  two  children,  stored 
in  a  linked  list  called  child  list.  Stored  along  with  each  tree  node  is  its  type 
(P,  S ,  or  leaf),  a  pointer  to  its  child  list  and,  if  it  is  a  leaf  node,  an  ordered 
pair  giving  the  endpoints  of  its  associated  edge. 
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The  following  functions  are  also  used: 

left _ child(jc).  for  x  a  tree  node,  if  x  is  not  a  leaf,  this  returns  the 

leftmost  child  in  jc’s  child  list;  otherwise,  it  returns  the  value  null. 

right _ child(jc).  for  x  a  tree  node,  if  x  is  not  a  leaf,  this  returns  the 

rightmost  child  in  jut’s  child  list;  otherwise  it  returns  the  value  null. 

next_sibling(<7).  for  q  a  nonroot  tree  node,  this  returns  the  child 
following  q  in  the  child  list  of  the  parent  of  q  or  null  if  no  such  child 
exists. 

Ieftmost__leaf(jc).  for  x  a  tree  node,  if  x  is  not  a  leaf,  this  returns 
the  leftmost  node  in  jc’s  child  list  that  is  a  leaf  or  null  if  no  such  node 
exists. 

algorithm  components  (T) 

input:  a  binary  series-parallel  decomposition  tree  T  of  a  biconnected 
multigraph  G 

output:  the  three-edge-connected  components  of  G 

begin 

r  :=  the  root  of  T 
compressO) 
remove_non_sep(r) 
remove__sepO) 

end 

algorithm  compressO) 

input:  a  node  jc  in  a  binary  series-parallel  decomposition  tree  T 
output:  the  compressed  form  of  the  subtree  rooted  at  x 
begin 

if  jc  is  a  leaf  node  then  return 

comprcss(!cft_chiid(x)) 

compress(right_child(jc)) 

if  x  and  left _ child(jc)  are  of  the  same  type 

then  in  the  child  list  of  x,  replace  left _ child(jc)  by  the  child  list  of 

left _ child(jc) 

if  jc  and  right_childO)  are  of  the  same  type 

then  in  the  child  list  of  jc,  replace  right _ child(jc)  by  the  child  list  of 

right  _child(jc) 

end 

algorithm  remove„non_sep(^) 

input:  a  node  q  in  a  compressed  series-parallel  decomposition  tree  T  of  a 
multigraph  G 

output:  the  graph  G,  after  deletion  of  all  j,  f-nonseparating  pairs  that  are 
contained  in  the  subtree  of  T  rooted  at  q ,  and  addition  of  virtual 
edges 
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begin 

ch  :=  left _ child(<7) 

while  ch  is  not  null 
begin 

if  ch  is  not  a  leaf  node  then  remove_non__sep(ch) 

ch  —  next_sibling(ch)  # 

end 

if  q  is  an  S-node 

then  while  q  has  two  children  that  are  leaves 
begin 

leaf],  leaf  2  ==  the  first  two  leaf-node  children  of  q 
(«,  u)  —  the  ordered  edge  associated  with  leaf  1 
(w,  x)  •=  the  ordered  edge  associated  with  leaf  2 
delete  edges  uu  and  wx  from  G 
add  edges  ux  and  vw  to  G 

create  tree  node  new  representing  the  ordered  edge  (u,  x) 
if  q  has  more  than  two  children 
then  replace  all  children  of  q  between  leaf  1  and  leaf  2  (inclu¬ 
sive)  by  new 
else  replace  q  by  new 

end 

end 

algorithm  remove  _sep( /<%>/) 

input:  the  root  of  a  compressed  series-parallel  decomposition  tree  T  of  a 
multigraph  G  without  any  s,t- nonseparating  pairs 
output:  the  graph  G  after  deletion  of  the  ^/-separating  pair,  if  present, 
and  addition  of  virtual  edges 

begin 

if  robt  has  exactly  two  children 
then  begin 

cltc2  ;=  the  children  of  root 

if  c1  is  not  a  leaf  node  then  cx  •=  leftmost_leaf(cj) 
if  c2  is  not  a  leaf  node  then  c2  leftmost__leaf(c2) 
if  c1  and  c2  are  both  nonNULL 
then  begin 

(u,  u)  —  the  ordered  edge  associated  with  c, 

(w,  x)  :=  the  ordered  edge  associated  with  c2 
delete  edges  uv  and  wx  from  G 
add  edges  uw  and  ux  to  G 

end 


end 


end 
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Lemma  4.  Algorithm  components  runs  in  Oim  +  n)  time  on  a  graph 
with  n  vertices  and  m  edges. 

Proof.  The  algorithm  takes  time  proportional  to  the  size  of  the  binary 
decomposition  tree,  which  is  0(m  +  n).  | 

Thus,  in  our  setting,  components  takes  O(n)  time.  We  note  for  com¬ 
pleteness  that  a  more  complex  linear-time  approach  may  be  viable  [18],  by 
modifying  the  car  decomposition  techniques  used  to  decide  vertex  connec¬ 
tivity  in  [10]. 

3.3.  The  Coneclness  of  components 

Neither  the  components  driver  nor  algorithm  compress  require  discus¬ 
sion. 

Consider  algorithm  remove_non_sep.  Note  first  that  remove_non„ 
sep  cannot  inadvertently  remove  an  s ,  /-separating  pair,  because  the  edge 
st  must  be  a  child  of  the  root  (which  is  a  P-‘node  because  G  is  bicon- 
nected),  and  remove_non_sep  eliminates  only  edges  that  are  children  of 
5-nodes. 

To  proceed,  we  classify  edges  and  pairs  of  edges  in  a  2TSP  graph  as 
follows.  A  single  edge  can  be  cither  a  cut  edge  or  a  noncut  edge.  A  pair  of 
edges  can  be:  a  pair  of  cute  edges,  an  s ,  /-separating  pair,  an  s ,  /-nonsep¬ 
arating  pair,  or  a  noncut  pair . 

Let  G,  and  G2  be  2TSP  graphs  such  that  Gs  is  the  graph  formed  by 
composing  them  in  series  and  G/}  is  the  graph  formed  by  composing  them 
in  parallel.  Suppose  e  is  an  edge  in  G{  and  /  is  an  edge  in  G2.  Tabic  1 
shows  the  relation  between  the  class  of  edge  e  in  G„  edge  /  in  G2,  and 
the  pair  e,  /  in  Gs  and  Gp.  For  example,  if  edges  e  and  /  are  cut  edges  in 
G |  and  G2,  respectively,  then  e  and  /  must  be  an  s,  /-separating  pair 
in  Gp. 

Now  suppose  edges  e  and  /  are  both  in  the  2TSP  graph  Gu  and  G2  is 
any  other  2TSP  graph.  Graphs  Gs  and  Gp  are  as  defined  previously. 
Table  2  relates  the  class  of  e  and  /  in  G,  to  their  class  in  Gs  and  Gp. 


TABLE  1 


Class  of 

Class  of 

Class  of  e  and  / 

edge  e  in  Gx 

edge  /  in  G2 

In  G, 

In  Gp 

noncut  edge 

noncut  edge 

noncut  pair 

noncut  pair 

noncut  edge 

cut  edge 

/  a  cut  edge 

noncut  pair 

cut  edge 

noncut  edge 

e  a  cut  edge 

noncut  pair 

cut  edge 

cut  edge 

cut  edges 

sy  ^-separating 
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TABLE  2 


Class  of  edges  e  and  / 

In  G, 

InG, 

In  Gp 

cut  edges 

cut  edges 

s ,  /-nonseparating 

s ,  /-nonseparating 

s,  /-nonseparating 

s,  /-nonseparating 

s ,  /-separating 

s ,  /-separating 

noncut  pair 

noncut  pair 

noncut  pair 

noncut  pair 

Let  G  be  a  2TSP  graph  with  compressed  decomposition  tree  T.  Let  x 
denote  an  arbitrary  internal  node  in  7,  and  let  Tx  denote  the  subtree  of  7 
rooted  at  x.  Then  the  2TSP  graph  that  has  Tx  as  a  decomposition  tree  is  a 
constituent  graph  for  G  with  respect  to  7.  For  any  edge  e  in  G,  let  e 
denote  the  node  in  7  with  label  e.  If  e  and  /  are  edges  in  G  then  the  least 
constituent  graph  containing  e  and  f  is  the  smallest  constituent  graph  //  of 
G  that  contains  both  c  and  /.  This  graph  has  a  decomposition  tree  Tz , 
where  z  is  the  least  common  ancestor  of  e  and  /  in  7.  Table  3  gives  the 
relation  between  the  class  of  an  edge  in  li  and  its  class  in  G. 

The  following  lemmas  are  used  to  justify  the  correctness  of  the  proce¬ 
dure  for  finding  special  pairs,  removing  special  pairs,  updating  the  decom¬ 
position  tree  for  the  connected  component  containing  s  and  t ,  and  adding 
virtual  edges. 

Lemma  5.  Let  G  be  a  2 TSP  graph ,  and  let  e  and  f  be  an  s ,  t-nonseparat- 
ing  pair  in  G.  If  H  is  the  least  constituent  graph  of  G  containing  e  and  /,  then 
e  and  f  are  cut  edges  in  H. 

Proof  By  the  first  two  lines  of  Table  3,  either  e  and  /  are  cut  edges  in 
H ,  as  claimed,  or  they  form  an  s ,  /-nonseparating  pair.  Because  H  is  a 
least  constituent  graph,  H  must  be  formed  by  composing  two  2TSP  graphs 
and  H2  such  that  //,  contains  e  and  H2  contains  /.  According  to 
Table  1,  e  and  /  cannot  be  an  j,  /-nonseparating  pair  in  PI.  Therefore  they 
must  be  cut  edges  in  H ,  as  claimed.  | 


TABLE  3 


Class  of  edges  e  and  / 

In  H 

In  G 

cut  edges 

cut  edges  or  s,  /-nonseparating 

s,  /-nonseparating 

s,  /-nonseparating 

s,  /-separating 

s,  /-separating  or  noncut  pair 

noncut  pair 

noncut  pair 
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Lemma  6.  If  G  is  a  2TSP  graph  and  T  is  a  compressed  decomposition  tree 
for  G,  then  edge  e  is  a  cut  edge  in  G  if  and  only  if  the  root  of  T  is  an  S-nodc 
and  e  is  a  child  of  the  root . 

Proof  Suppose  e  is  a  cut  edge  in  G.  Then  no  ancestor  of  e  in  T  is  a 
P- node,  because  subtrees  rooted  at  P- nodes  represent  biconnected  graphs. 
Hence  the  path  from  the  root  of  T  to  e  includes  only  5-nodes.  But 
because  T  is  compressed,  e  must  be  a  child  of  the  root,  which  must  be  an 
5-nodc.  Conversely,  if  the  root  of  T  is  an  5-node  and  e  is  a  child  of  the 
root,  then  every  path  from  s  to  t  in  G  must  include  e.  Therefore  e  is  a  cut 
edge.  | 

Lemma  7.  If  e  and  f  are  edges  in  a  biconnected  2 TSP  graph  G  with  a 
compressed  decomposition  tree  7’,  then  e  and  f  are  an  s,  t -nonseparating pair  if 
and  only  if  e  and  f  are  siblings  whose  parent  is  an  S-node . 

A 

Proof  Let  z  be  the  least  common  ancestor  .of  .e  and  /  in  T.  Let  H  be 
the  2TSP  graph  having  Tz  as  a  decomposition  tree.  Note  that  H  is  the 
least  constituent  graph  of  G  that  contains  e  and  /. 

Suppose  e  and  /  are  5, /-nonseparating.  By  Lemma  5,  e  and  /  are  cut 
edges  in  II.  Then,  by  Lemma  6,  e  and  /  are  children  of  z  and  z  is  an 
5-node. 

Now  suppose  e  and  /  are  siblings  whose  parent  is  an  5-node.  This 
implies  that  e  and  /  are  cut  edges  in  H .  Then,  by  Table  3,  e  and  /  must  be 
either  cut  edges  or  an  s ,  Lnonseparating  pair  in  G.  Because  G  is  bicon¬ 
nected,  e  and  /  must  in  fact  be  an  /-nonseparating  pair.  | 

In  what  follows,  we  say  that  node  x  in  tree  T  occurs  “between”  nodes  y 
and  z  if  x  occurs  between  y  and  z  in  the  pre-order  traversal  of  T.  Let  Hx 
denote  the  graph  having  Tx  as  a  decomposition  tree. 

Lemma  8.  Let  G  be  a  biconnected  2TSP  graph  and  let  T  be  a  compressed 
decomposition  tree  for  G.  In  G,  let  e  andf  be  an  s ,  t-nonseparating pair  whose 
removal  yields  a  graph  G{  containing  s  and  t>  and  another  graph  G2.  Suppose 
e  occurs  before  f  in  T ,  and  let  (n,  v)  and  ( w ,  x)  be  the  pairs  stored  with  e  and 
/,  respectively.  Then  the  edges  in  G2  are  {g\g  occurs  between  e  and  f  in  T}, 
and  the  Vetiices  in  G2  are  the  endpoints  of  these  edges  plus  {u,  w}. 

Proof  Because  e  and  /  are  s,  Lnonseparating,  by  Lemma  7,  e  and  / 
are  siblings  whose  parent  z  is  an  5-node.  Because  G  is  biconnected,  z’s 
parent,  y,  is  a  P- node. 

Removal  of  e  and  /  from  Hz  leaves  three  graphs  77, ,  H2 ,  and  H3  such 
that:  Hi  contains  all  edges  represented  by  nodes  occurring  before  e  in  Tz, 
their  associated  vertices,  and  vertex  w;  H2  contains  all  edges  represented 
by  nodes  occurring  between  e  and  /  and  associated  vertices  plus  [v,w]; 
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and  H3  contains  all  edges  represented  by  nodes  occurring  after  /  in  Tz 
and  associated  vertices  plus  x.  The  source  and  sink  of  Hz  are  in  and 
//3,  respectively. 

In  Hy,  e  and  /  are  an  s,  Lnonseparating  pair  whose  removal  leaves  H2 
and  another  graph  containing  Hu  H3 ,  and  the  portion  of  Hy  not  in  Hz. 
The  source  and  sink  of  Hy  are  the  source  and  sink  of  Hz  and  are  not  in 
H2.  Thus  the  claim  holds  for  Hy. 

Any  graph  formed  by  composing  two  2TSP  graphs,  one  of  which  has  Hy 
as  a  constituent,  still  has  the  claimed  property  because  the  paths  in  the 
new  graph  that  are  not  in  Hy  can  only  connect  vertices  not  in  H2.  Because 
no  new  paths  are  added  from  vertices  in  H2  to  vertices  not  in  H2,  it  is  still 
the  case  that  removal  of  e  and  /  separates  the  vertices  in  H2  from  the  rest 
of  the  graph.  Thus  the  claim  also  holds  for  any  graph  having  II y  as  a 
constituent.  | 

Corollary  1.  Let  G ,  T,  e ,  and  f  be  as  defined  in  Lemma  8.  Let  G,'  be 
the  graph  consisting  of  G,  plus  virtual  edge  wc  and  let  G2  be  the  graph 
consisting  of  G2  plus  virtual  edge  vw.  Let  z  be  the  parent  of  e  and  f  in  T  and 
let  be  the  children  of  z  in  order  from  left  to  right  such  that  ri  —  e  and 

r-  =  /.  Let  g  be^a  tree  node  representing  g  —  ux;  the  ordered  pair  stored  with  g 
is  (u,x).  Let  h  be  a  tree  node  representing  h  —  vw;  the  ordered  pair  stored 
with  h  is  (v,  w). 

A  decomposition  tree  for  G/  is  formed  by  replacing  rit . . . ,  r-  by  node  g  if 
i  ¥=  l  or  j  ¥*  k.  and  replacing  Tz  by  g  otherwise. 

A  decomposition  tree  for  G2  is  one  of  the  following: 

(a)  empty,  if j  =  i+  1; 

(b)  a  P-node  with  children  h  and  ri+ 1}  if  j  =  i  +  2; 

(c)  a  P-node  with  two  children  h  and  an  S-node ,  which  in  turn  has 
children  ri+l, . . . ,  r}_  „  otheiwise. 

Proof  We  know  by  Lemma  8  that  G,'  is  formed  by  replacing  the 
portion  of  G  represented  by  nodes  in  T  between  e  and  /  by  a  single  edge 
ux,  so  the  decomposition  tree  for  G'  is  as  claimed.  We  also  know  that  G2 
consists  of  the  edges  represented  by  nodes  in  T  strictly  between  e  and  /, 
their  associated  vertices  and  vertices  v  and  w,  with  the  edge  vw  composed 
in  parallel.  Because  the  nodes  between  e  and  /  are  children  of  an  5-node, 
the  decomposition  tree  for  G2  is  as  claimed.  | 

Lemma  9.  Let  G  be  a  biconnected  2TSP  graph  and  let  T  be  a  compressed 
decomposition  tree  for  G.  A  pair  of  s,  t-nonseparating  edges,  e,  f,  is  a  special 
pair  in  G  if  and  only  if  for  every  sibling  y  of  e  and  f  in  T  that  occurs  between  e 
and  f,y  is  not  a  leaf  and  Ty  does  not  represent  a  graph  containing  an 
s,  t-nonseparating  pair. 
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Proof.  Because  e  and  /  are  s,  /-nonseparating,  removal  of  e  and  /' 
yields  two  graphs  G,  and  G2  such  that  G{  contains  s  and  /.  Let  G'  be  the 
graph  G2  plus  the  virtual  edge.  Edges  e  and  /  are  special  if  and  only  if  G' 
is  thrcc-cdgc-connected.  Let  V  be  the  decomposition  tree  for  G'  as 
described  in  Corollary  1.  Let  z  be  the  parent  of  e  and  /  in  T. 

Suppose  e  and  /  are  special.  We  employ  proof  by  contradiction  and  we 
assume  there  exists  a  child  y  of  z  between  e  and  / such  that  y  is  a  leaf  or 
Ty  represents  a  graph  containing  an  s ,  /-nonseparating  pair.  Gf  must  be 
three-edge-connected,  which  implies  V  has  no  cut  edges  or  cut  edge  pairs. 
If  y  is  a  leaf  then,  by  Corollary  1,  T  consists  of  a  P-node  with  two 
children.  One  of  them  is  a  leaf  (representing  the  virtual  edge)  and  the 
other  is  cither  y  or  an  5-nodc  having  y  as  a  child.  In  either  case  the 
structure  of  V  requires  that  the  virtual  edge  and  y  form  an  s ,  /-separating 
pair  in  G\  a  contradiction.  If,  on  the  other  hand,  y  is  a  nonleaf  node 
whose  subtree  Ty  represents  a  graph  having  an  s,  /-nonseparating  pair, 
then  Ty  contains  an  5-node  with  two  leaves  as- children.  These  nodes  also 
represent  an  s ,  /-nonseparating  pair  in  G',  again  a  contradiction. 

A 

Now  suppose  e  and  /  satisfy  the  conditions  of  the  lemma,  but  that  e  and 
/  are  not  special.  According  to  Lemma  7,  z  is  an  5-node.  Because  T  is 
compressed  and  no  y  is  a  leaf,  each  y  is  a  P-node.  The  root  of  V  is  a 
P-node,  implying  that  G'  has  no  cut  edge. “Moreover,  the  root  has  exactly 
two  children,  one  of  which  is  a  virtual  edge.  Let  x  denote  the  other  child. 

If  there  is  only  one  node  y  between  e  and  /  in  P,  then  x  is  a  P-node  with 
Tx  =  Ty\  else  x  is  an  5-node  with  all  the  y  as  children.  Because  each  Ty  is 
rooted  at  a  P-node,  no  Ty  has  a  cut  edge  and  neither  does  Tx.  Therefore 
G  has  no  /-separating  pairs.  If  T  has  an  s,  /-nonseparating  pair,  then  Tx 
must  have  either  an  s ,  /-nonseparating  pair  or  a  pair  of  cut  edges.  This  in 
turn  means  that  some  Ty  contains  a  cut  edge  or  an  s ,  /-nonseparating  cut 
edge  pair,  a  contradiction.  | 

Corollary  2.  If  a  biconnected  2TSP  graph  contains  an  s,  t-nonseparat- 
ing  pair ,  then  it  contains  a  special  pair. 

Proof  f  Let  G  be  a  biconnected  2TSP  graph  that  contains  one  or  more 
s ,  /-nonseparating  pairs.  Let  T  denote  a  compressed  decomposition  tree 
for  G.  By  Lemma  6,  T  must  contain  an  5-node  whose  children  include  at 
least  two  leaf  nodes.  Among  all  such  5-nodes,  choose  one,  say  x,  such  that 
no  other  5-node  in  Tx  has  two  or  more  leaf  nodes  as  children.  Lemma  6 
implies  that  for  each  nonleaf  child  y  of  x,  Ty  cannot  contain  an '  s,  /-non- 
separating  pair.  Now  from  among  all  the  leaf  nodes  that  are  children  of  x , 
choose  two,  say  e  and  /,  such  that  no  child  of  x  that  occurs  between  e  and 
/  is  a  leaf.  By  Lemma  9,  e  and  /  form  a  special  pair.  | 
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Lemma  10.  Let  G  be  a  biconnected  2TSP  graph  that  contains  no  special 
pairs.  Then  G  contains  at  most  one  s,  t-separating  pair. 

Proof.  We  use  contradiction.  Suppose  { elte2 }  and  {e3,  e4)  are  two 
distinct  s,  /-separating  pairs  in  G.  Assume,  without  loss  of  generality,  that 
e1  ¥=  e3  and  e2  &  e3.  Let  Gj  and  G2  be  2TSP  graphs  such  that  G*is  the 
result  of  their  parallel  composition.  From  Tables  1  and  2,  we  see  that  any 
s,  /-separating  pair  in  G  consists  of  a  cut  edge  in  G,  and  a  cut  edge  in  G2. 
Thus  each  of  elf  e2,  and  e3  is  a  cut  edge  in  either  G,  or  G2.  Without  loss 
of  generality,  assume  that  two  of  these  three  edges  are  in  G,.  But  now,  by 
Table  2,  these  two  edges  constitute  an  s ,  /-nonseparating  pair  in  G, 
contradicting  Lemma  9.  | 

Recall  that  it  suffices  to  find  three-edge-connected  components  of 
biconnected  graphs,  because  each  cut  edge  and  cut  edge  pair  in  any  graph 
is  contained  within  one  of  its  biconnected  components.  Lemmas  5  through 
9  demonstrate  that  remove_non_sep  correctly  finds  special  pairs,  adds 
virtual  edges,  and  updates  the  decomposition  tree  to  represent  the  graph 
left  after  the  edges  are  removed.  By  Lemma  10,  at  most  one  s,  /-separating 
pair  remains  in  the  graph.  This  pair,  if  it  exists,  is  found  and  removed  using 
remove_sep.  We  omit  the  analysis  of  this  last  step,  which  at  this  point  is 
relatively  straightforward. 

3.4.  Algorithm  test 

Algorithm  test  is  the  heart  of  our  method.  The  input  to  test  is  a 
three-edge-connected  series-parallel  multigraph.  In  such  a  graph,  suppose 
o  is  a  vertex  with  exactly  two  neighbors,  u  and  w,  and  suppose  there  is  only 
one  copy  of  the  edge  vw.  (Thus  there  are  at  least  two  copies  of  itv  by 
three-edge-connectivity.)  We  say  that  v  is  pruned  if  the  multiplicity  of  uu 
is  set  to  2.  Similarly,  we  say  a  graph  is  pruned  if  each  vertex  fitting  the 
profile  of  v  is  pruned. 

algorithm  test(G) 

input:  a  three-edge-connected  series-parallel  multigraph  G 
output:  YES,  if  G  contains  an  immersed  K4 ,  NO  otherwise 
begin 

for  each  vertex  u  in  G  with  exactly  one  neighbor  do 
delete  all  but  three  copies  of  edges  incident  on  u 

if  any  cut  point  in  G  has  degree  7  or  more 
,  then  output  YES  and  halt 

for  each  biconnected  component  B  with  four  or  more  vertices  do 
begin 
prune  B 
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if  there  is  a  vertex  in  B  with  degree  5  or  more 
then  output  YES  and  halt 

end 

output  NO  and  halt 
end 

Theorem  1.  Algorithm  test  runs  in  O(n)  time  on  a  graph  with  n  vertices. 

Proof.  Biconnected  components  and  cut  points  can  be  found  in  linear 
time  using  a  depth-first  search.  Operations  such  as  deleting  edges  incident 
on  vertices  with  only  one  neighbor  and  pruning  vertices  with  two  neighbors 
can  be  accomplished  in  constant  time.  The  existence  of  a  vertex  with 
degree  5  or  more  can  be  confirmed  simply  by  scanning  the  list  of  edges. 
Thus  the  theorem  holds.  | 

3.5.  The  Correctness  of  test 

The  correctness  of  test  relies  on  a  number  of  lemmas,  which  follow. 
Before  proceeding,  we  make  a  few  useful  observations. 

Observation  1.  If  IT  is  immersed  in  H,  and  if  M'  is  a  K4  model  in  H', 
then  in  H  there  is  a  KA  model  M  with  the  same  comers  as  Mf. 

Observation  1  follows  from  noting  that  edges  in  H '  map  to  edge-disjoint 
paths  in  //,  and  that  in  Ha  suitable  K4  model  can  be  found  merely  by 
replacing  the  edges  of  M’  with  their  image  paths  in  H. 

Observation  2  is  a  well-known  property  of  series-parallel  graphs  (see 
[13],  for  example,  for  a  proof). 

Observation  2.  Every  biconnected  senes  -parallel  mulligraph  with  four  or 
more  vertices  contains  at  least  two  nonadjacent  vertices  with  exactly  two 
neighbors. 

Suppose  v  is  a  vertex  with  exactly  two  neighbors,  x  and  y.  We  say  that  v 
is  shorted  if  we  lift  all  pairs  of  edges  vx  and  vy  and  if  we  delete  any 
remaining  edges  incident  on  v  along  with  v  itself. 

» 

Observation  3.  Shorting  preserves  biconnectivity ,  three-edge-connectiv¬ 
ity ,  and  series-parallelness. 

Observation  3  holds  because  shorting  a  vertex  does  not  change  the 
number  of  vertex-disjoint  or  edge-disjoint  paths  between  any  pair  of 
remaining  vertices. 

Observation  4.  A  biconnected  component  of  a  three-edge-connected 
graph  is  three-edge-connected. 
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Observation  4  follows  from  noting  that  edge-disjoint  paths  may  as  well 
be  made  simple  and  that,  whenever  a  pair  of  vertices  lies  in  the  same 
biconnected  component,  all  vertices  along  simple  paths  connecting  them  in 
the  original  graph  must  also  lie  in  this  component. 

The  next  lemma  justifies  the  first  step  in  test,  in  which  edges  incident  on 
vertices  with  only  one  neighbor  have  their  multiplicity  reduced  to  3.* 

Lemma  11.  Let  G  denote  a  graph  in  which  a  vertex ,  v,  has  exactly  one 
neighbor ,  w.  Let  G'  be  obtained  from  G  by  deleting  all  but  three  copies  of  the 
edge  vw.  Then  K4  is  immersed  in  G  if  and  only  if  K4  is  immersed  in  G' . 

Proof  If  K4  is  immersed  in  G,  then  G  contains  a  K4  model  whose 
edge  images  are  simple  paths.  Because  v  cannot  be  an  intermediate  vertex 
in  a  simple  path,  at  most  three  copies  of  the  edge  vw  are  needed.  Thus  K4 
is  also  immersed  in  G'.  If  K4  is  not  immersed  in  G,  then  neither  is  it 
immersed  in  G'  because  G'  is  a  subgraph  of  G.  | 

Under  certain  conditions,  it  is  possible  to  detect  an  immersed  K4  by 
checking  whether  the  graph  in  Fig.  3,  henceforth  termed  graph  M ,  is 
immersed  in  the  input  graph.  The  next  four  lemmas  are  useful  in  finding 
M-models. 

Lemma  12.  Let  G  be  three-edge-connected ,  with  noncut  point  vertices  u 
and  v.  Let  w  £  {u,  v)  denote  a  vertex  in  G.  Then  there  exist  three  mutually 
edge-disjoint  paths ,  each  beginning  with  w  and  ending  with  either  u  or  v,  such 
that  at  most  two  of  these  paths  contain  it,  at  most  two  contain  v ,  and  none 
contains  both  u  and  v. 

Proof  The  paths  we  seek  to  identify  are  illustrated  in  Fig.  4,  with 
dashed  lines  denoting  edge-disjoint  paths  that  do  not  contain  u  or  v  as  an 
intermediate  vertex.  Consider  three  mutually  edge-disjoint  paths  Pit  P2 , 
and  P3,  each  from  w  to  {u,v}.  These  paths  exist  because  G  is  three-edge- 
connected.  Assume  all  three  contain,  say,  u.  Hence  all  three  may  as  well 
be  simple  and  end  at  u.  Consider  now  some  path  P  between  w  and  v  that 
does  not  contain  u  (such  a  path  exists  because  u  is  not  a  cut  point).  P  may 
contain  vertices  and  edges  in  Pu  P2 ,  and  P3.  Let  y  be  the  last  vertex  in  P 
(counting  from  w)  that  is  also  in  Px  or  P2  or  P3.  Without  loss  of 


FIG.  3.  The  graph  of  M. 
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U 
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W 


FIG.  4.  Edgc-disjoint  paths  in  a  thrcc-cdge-conncctcd  graph. 


generality,  assume  y  is  in  Pv  We  can  construct  a  path  F  from  w  to  v ,  by 
taking  Px  until  we  reach  y,  and  using  P  from  there  on.  Thus  F,  P2>  and 
P3  are  the  desired  edge-disjoint  paths,  with  P'  not  containing  u.  | 

Lemma  13.  Let  G  be  three-edge-connected.  Let  v  denote  a  noncut  point 
vertex  in  G  with  degree  at  least  4,  let  u  and  w  be  neighbors  of  v,  and  suppose 
uv  has  multiplicity  at  least  2.  Then  G  contains  an  M  model ,  with  comers  u , 
and  wy  and  with  v  the  image  of  M's  degree-4  vertex. 

Proof  We  restrict  our  attention  to  G\  the  biconnected  component  of 
G  containing  u.  ( G '  is  three-edge-connected  by  Observation  4.  Because  v 
is  not  a  cut  point,  its  neighborhood  is  unchanged  in  G  )  From  Lemma  12, 
we  know  that  there  are  three  mutually  edge-disjoint  paths  from  w  to  {u,  v) 
such  that  at  most  two  of  these  paths  contain  u  and  at  most  two  of  these 
paths  contain  v.  One  of  these  paths  is  the  edge  wv.  If  one  of  the  other 
paths  contains  v  as  well,  the  lemma  holds.  So  suppose  neither  contains  v. 
This  situation  is  illustrated  in  Fig.  5a.  To  complete  an  M  model,  we  must 
find  an  edge-disjoint  path  [/w].  If  uv  has  multiplicity  3  or  more,  we  can 
construct  this  path  by  combining  one  of  the  edges  vu  and  one  of  the  paths 
[uw].  So  assume  uv  has  multiplicity  2,  and  let  x  denote  a  neighbor  of  v 
other  than  u  or  w.  Because  G  is  biconnected,  there  is  a  path  [xw]  that 
does  not  contain  any  of  the  edges  incident  on  v.  Let  y  denote  the  first 
vertex  on  this  path  (counting  from  x)  common  to  either  of  the  two  paths 
[uw].  We  can  combine  the  edge  vx  with  the  paths  [xy]  and  [yw]  to  get  the 
desired  path  [wv],  as  is  clear  from  Fig.  5b.  | 


FIG.  5.  Graphs  used  in  the  proof  of  Lemma  13. 
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Lemma  14.  Let  G  be  three-edge-connected ,  with  at  least  three  vertices.  Let 
v  denote  a  vertex  in  G  with  only  one  neighbor  u ,  and  let  w  v  denote  a 
neighbor  of  u.  Suppose  v  has  degree  at  least  4.  Then  G  contains  an  M  model , 
with  comers  u,  v,  and  wy  and  with  v  the  image  of  M's  degree-4  vertex. 

Proof  The  graph  depicted  in  Fig.  6  is  immersed  in  G.  A  satisfactory 
model  of  M  can  be  obtained  by  lifting  two  pairs  of  edges  in  this  grapli.  | 

Lemma  15.  Let  G  be  three-edge-connected  and  series-parallel ,  with  at 
least  three  vertices.  Let  v  denote  a  vertex  in  G  with  degree  at  least  4.  Then  G 
contains  an  M  model  in  which  v  is  the  image  of  M's  degree-4  vertex. 

Proof  If  v  has  only  one  neighbor,  then  the  model  exists  by  Lemma  14. 
Otherwise  let  u  and  w  denote  arbitrary  neighbors  of  v.  If  /ms  a  cut  point, 
then  the  graph  in  Fig.  7  is  immersed  in  Gt  satisfying  the  statement  of  the 
lemma.  So  suppose  v  is  not  a  cut  point.  If  any  vertex  in  G  is  adjacent  to  v 
by  two  or  more  edges,  then  Lemma  13  applies,  and  the  model  exists.  So 
suppose  there  arc  no  edges  of  multiplicity  greater  than  1  incident  on  v. 
Then  we  may  force  this  condition  by  iteratively  deleting  vertices  with  only 
one  neighbor  and  shorting  vertices  with  only  two  neighbors.  Neither 
operation  changes  the  degree  of  v ,  or  affects  the  three-edge-connectivity 
and  series-parallelness  of  G.  Thus,  by  the  time  the  order  of  G  is  reduced 
to  3  (or  before),  some  edge  incident  on  v  must  have  multiplicity  2  or  more. 
Lemma  13  and  Observation  1  then  imply  that  the  lemma  holds.  | 

The  preceding  lemmas  enable  us  to  detect  K4  models  that  span  cut 
points.  .  . 

Lemma  16.  Let  G  be  three-edge -connected  and  series-parallel ,  and  let  all 
vertices  in  G  with  exactly  one  neighbor  have  degree  3.  Suppose  G  has  a  cut 
point  v  with  degree  1  or  more.  Then  K4  is  immersed  in  G. 

Proof  Let  C,,...,CA  denote  the  connected  components  of  G  -  {*;}. 
Let  Ai  denote  Ct  augmented  with  a  copy  of  v  and  the  edges  it  induces. 
Each  is  three-edge-connected,  and  thus  contains  a  model  of  the 
triple-edge  shown  in  Fig.  8a,  with  any  pair  of  vertices  serving  as  the 
corners.  Without  loss  of  generality,  assume  Al  contains  the  least  number 
of  edges  incident  on  vy  and  let  H  denote  G  -  Cv  It  follows  that  v  has 
degree  4  or  more  in  H  and  that  H  has  at  least  three  vertices.  Thus,  by 
Lemma  15,  there  is  an  M  model  in  H  with  v  the  image  of  the  degree-4 


FIG.  6.  Graph  used  in  Ihe  proof  of  Lemma  14. 
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FIG.  7.  Graph  used  in  the  proof  of  Lemma  15. 


vertex  in  M.  This  M  model  can  be  combined  with  a  model  of  the 
triple-edge  in  A{  to  form  in  G  a  model  of  the  graph  shown  in  Fig.  8b, 
which  contains  an  immersed  K4.  | 

K4  models  that  span  cut  points  may  exist  even  if  no  cut  points  have 
degree  7  or  more.  Nevertheless,  we  can  restrict  our  search  to  biconnected 
components,  as  we  show  in  the  following  text. 

Lemma  17.  Suppose  G  has  no  cut  point  with  degree  exceeding  6.  Then  K4 
is  immersed  in  G  if  and  only  if  K4  is  immersed  in  a  biconnected  component 
of  G. 

Proof  If  a  biconnected  component  of  G  contains  K4,  then  so  does  G, 
because  a  biconnected  component  is  a  subgraph.  To  prove  the  converse, 
consider  a  K4  model  in  G  with  the  K4  edges  mapped  to  simple  paths.  Let 
Uy  Wy  Xy  and  y  denote  the  corners  of  this  model,  and  suppose  there  is  a  cut 
point  u  that  separates  them.  We  know  that  u  cannot  be  one  of  the 
corners,  else  it  would  need  degree  7  or  more  (see  Fig.  9a).  v  cannot 
separate  two  corners  from  the  others,  else  it  would  need  degree  8  or  more 
(see  Fig.  9b).  So  it  must  be  that  v  separates  just  one  corner,  say  y,  from 
the  others  (see  Fig.  9c).  Thus  the  edge-disjoint  paths  [uy],  [wy],  and  [xy]  all 
pass  through  f,  and  we  can  construct  another  K4  model  in  which  u 
replaces  y  as  a  corner.  By  iterating  this  replacement,  we  eventually  get  a 
K4  model  all  of  whose  corners  (and  paths)  are  in  the  same  biconnected 
component.  | 


(a)  (b) 

FIG.  8.  Graphs  used  in  the  proof  of  Lemma  I  f>. 
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FIG.  9.  Models  of  that  span  a  cut  point. 


The  remaining  lemmas  in  this  section  deal  with  detecting  an  immersed 
K4  in  a  biconnected,  three-edge-connected,  series-parallel  graph.  First  we 
show  that  such  a  graph  must  have  at  least  one  vertex  of  degree  5  or  more 
if  it  is  to  contain  an  immersed  K4 .  Then  we  present  a  series  of  lemmas 
that  lead  up  to  a  proof  of  the  converse;  that  is,  if  even  one  vertex  has 
degree  5  or  more,  then  an  immersed  K4  exists. 

Lemma  18.  Let  v  denote  a  vertex  with  exactly  two  neighbors ,  u  and  w}  and 
suppose  the  edge  vw  has  'multiplicity  1 .  Then  u  and  v  can  be  comers  of  a  given 
K4  model  only  if  degree(u)  >  degree(v)  +  2. 

Proof  Let  x  and  y  denote  the  other  corners  of  this  model.  Paths  [ux] 
and  [uy]  need  not  contain  uv.  Either  [ox]  or  [vy]  has  to  pass  through  u. 
Thus  at  least  three  edges  are  incident  on  u  in  addition  to  the  copies  of  uv 
(see  Fig.  10),  and  the  lemma  follows.  | 

Lemma  19.  If  G  is  seiies-parallel  and  of  maximum  degree  4,  then  K4  is 
not  immersed  in  G. 

Proof  Suppose  otherwise,  and  let  H  denote  a  minimal  counterexam¬ 
ple.  H  must  be  three-edge-connected  by  Lemma  1.  II  must  also  be 
biconnected,  because  a  cut  point  in  a  thrce-edge-connectcd  graph  has 
degree  at  least  6.  Thus,  by  Observation  2,  //  contains  a  vertex,  vy  with 
exactly  two  neighbors,  it  and  w.  It  must  be  that  v  is  needed  as  a  corner  in 
every  K4  model,  else  we  can  short  it,  contradicting  minimality.  So  v  has 
degree  3  and  we  assume,  without  loss  of  generality,  that  uv  has  multiplicity 
2,  vw  has  multiplicity  1.  We  now  fix  the  remaining  corners  of  some  K4 
model.  Vertex  u  cannot  be  one  of  these  corners,  by  Lemma  18.  But  now  it 


366 


BOOTH  ET  AL. 
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FIG.  10.  A  model  of  K4  with  corners  it,  r,  jt/and  y. 


is  easy  to  see  that  a  can  replace  v  in  this  model,  contradicting  the  fact  that 
v  must  be  a  corner.  | 

We  henceforth  use  the  term  candidate  graph  to  denote  a  biconnected, 
three-edge-connccted,  series-parallel  multigraph  with  four  or  more  ver¬ 
tices. 

Lemma  20.  In  a  candidate  graph ,  G,  suppose  vertex  v  has  exactly  two 
neighbors ,  u  and  wf  and  suppose  the  multiplicity  of  uv  is  greater  than  the 
multiplicity  of  vw.  If  degree(u)  —  degree(v)  >  2,  then  K4  is  immersed  in  G. 

Proof  Suppose  otherwise,  and  let  H  denote  a  minimal  counterexam¬ 
ple.  Let  x  #  v  denote  another  vertex  with  exactly  two  neighbors.  The  edge 
xu  must  exist  and  have  multiplicity  2  or  more,  else  we  can  short  x  without 
affecting  the  degree  of  u  or  v ,  thus  contradicting  the  minimality  of  G. 
Consider  the  effect  of  shorting  uf  producing  the  graph  H Because  u  has 
degree  at  least  4  in  //',  we  know  from  Lemma  13  that  the  M  model 
illustrated  in  Fig.  11a  is  immersed  in  //'.  But  this  means  that  the  graph 
shown  in  Fig.  lib,  which  contains  K4i  is  immersed  in  H ,  thereby  contra¬ 
dicting  the  assumption  that  H  is  a  counterexample.  | 


FIG.  1 1.  Graphs  used  in  the  proofs  of  Lemmas  20  and  22. 
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Recall  pruning,  as  defined  in  Section  3.4. 

Lemma  21.  In  a  candidate  graph ,  G,  suppose  vertex  v  has  exactly  two 
neighbors ,  u  and  w,  and  suppose  vw  has  multiplicity  1.  Letting  G'  denote  the 
graph  resulting  from  pruning  v,  K4  is  immersed  in  G  if  and  only  if  it  is 
immersed  in  G' . 

< 

Proof  If  K4  is  immersed  in  G',  then  it  is  immersed  in  G  as  well, 
because  G'  c  G.  Suppose  K4  is  immersed  in  G.  If  G  contains  a  K4  model 
in  which  u  is  not  a  corner,  then  so  does  G',  because  pruning  is  irrelevant 
(at  most  one  of  the  images  of  the  K4  edges  in  this  model  can  pass  through 
v ).  So  suppose  v  is  a  corner  in  every  K4  model  in  G.  Vertex  u  must  also 
be  a  corner  in  all  these  models,  else  we  could  replace  v  with  u,  forming  a 
model  in  which  v  is  not  a  corner.  Now,  by  Lemma  18,  u  has  degree  at  least 
2  more  than  v,  a  property  unchanged  by  pruning.  Thus,  by  Lemma  20,  K4 
is  immersed  in  G'.  | 

Lemma  22.  In  a  pruned  candidate  graph,  G,  suppose  vertex  v  has  exactly 
two  neighbors ,  u  and  w ,  and  suppose  uv  has  multiplicity  at  least  3.  Then  there 
is  a  K4  model  in  G  with  comers  u,  v ,  w,  andx,  where  x  £  {v,  w]  is  a  neighbor 
of  u. 

Proof  Vertex  u  must  have  some  neighbor  not  in  [v,  w]  to  play  the  role 
of  *,  else  w  would  be  a  cut  point,  contradicting  the  biconnectivity  of  G. 
Because  G  is  pruned,  edge  vw  must  have  multiplicity  at  least  2,  and  the 
degree  of  u  must  be  at  least  two  more  than  the  multiplicity  of  uv.  Thus  in 
G',  the  graph  that  results  from  shorting  v ,  the  degree  of  u  is  at  least  4.  We 
conclude  from  Lemma  13  that  the  M  model  illustrated  in  Fig.  11a  is 
immersed  in  G',  and  the  graph  shown  in  Fig.  lib,  which  contains  K4 ,  is 
immersed  in  G.  | 

Lemma  23.  In  a  candidate  graph ,  G,  suppose  vertex  v  has  exactly  two 
neighbors ,  u  and  w,  suppose  uv  and  vw  each  have  multiplicity  at  least  2,  and 
suppose  uw  exists.  Then  there  is  a  K4  model  in  G  with  corners  u,  v,  w,  andx, 
where  x  £  {v,w}  is  a  neighbor  of  u. 

Proof  As  in  the  last  lemma,  such  an  x  must  exist.  We  apply  Lemma 
12,  with  w  playing  the  role  of  v  and  x  playing  the  role  of  w.  Thus  at  least 
one  of  the  graphs  shown  in  Fig.  12,  both  of  which  contain  K4,  is  immersed 
in  G.  | 

Lemma  24.  Let  G  denote  a  pruned  candidate  graph.  K4  is  immersed  in  G 
ij  and  only  if  G  has  a  vertex  of  degree  5  or  more. 

Proof  We  know  from  Lemma  19  that  a  candidate  graph  of  maximum 
degree  4  contains  no  K4.  To  prove  the  converse,  we  proceed  by  contradic¬ 
tion  and  we  assume  H  denotes  a  minimal  pruned  candidate  graph,  with  at 
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(a)  (b) 

FIG.  12.  Graphs  used  in  the  proof  of  Lemma  23. 


least  one  vertex  of  degree  5  or  more,  but  with  no  immersed  K4.  We 
observe  that  H  must  contain  more  than  four  vertices.  Otherwise,  let  a 
denote  a  vertex  in  H  with  degree  5  or  more.  If  a  has  only  two  neighbors, 
then  the  graph  of  Fig.  13a,  which  contains  K4,  is  a  subgraph  of  //, 
contradicting  our  assumption  that  H  contains  no  K4.  If  a  has  three 
neighbors,  then  the  graph  of  Fig.  13b,  which  also  contains  K4 ,  is  a 
subgraph  of  //,  leading  once  more  to  a  contradiction.  Thus  H  has  at  least 
five  vertices,  a  necessary  property  because  we  will  use  shorting  to  contra¬ 
dict  minimality,  and  a  candidate  graph  requires  at  least  four  vertices.  Let  u 
denote  a  vertex  in  H  with  exactly  two  neighbors,  u  and  iv,  and  assume  the 
multiplicity  of  uv  is  at  least  that  of  uw.  Lemma  22  guarantees  that  v 
cannot  have  degree  5  or  more.  If  v  has  degree  4,  Lemma  23  and  the  fact 
that  H  is  pruned  ensure  that  uw  does  not  exist.  But  now  we  can  short  u , 
obtaining  a  pruned  candidate  graph  that  contradicts  minimality.  So  v  must 
have  degree  3  and,  by  Lemma  20,  it  has  degree  4  or  less.  Biconnectivity 
requires  that  at  most  one  copy  of  uw  exists.  But  now  we  can  again  short  v 
to  obtain  a  pruned  candidate  graph,  contradicting  the  presumed  minimal¬ 
ity  of  H.  | 

This  completes  the  proof  of  the  correctness  of  test.  The  work  of  the  last 
two  sections  provides  the  proof  of  the  following  principal  result. 


(b) 

that  contain  a  vertex  of  degree  5. 


FIG.  13.  Pruned  four-vertex  graphs 


369 


FAST  ALGORITHMS  FOR  K4  IMMERSION  TESTING 

Theorem  2.  Algorithms  decompose,  components  and  test  correctly 
decide  in  linear  time  whether  K4  is  immersed  in  an  arbitrary  input  graph. 


4 .  FINDING  A  MODEL 

Once  the  presence  of  K4  has  been  detected  in  a  graph,  our  method  to 
identify  a  K4  model  proceeds  in  two  steps.  Algorithm  corners  is  first 
invoked  to  modify  the  input  graph  until  an  appropriate  set  of  corners  is 
isolated.  Then  algorithm  paths  is  used  to  find  the  K4  edge  images. 

4.1.  Algorithm  corners 

Algorithm  corners  marks  vertices  in  the  input  graph  as  part  of  the 
corner-finding  process.  All  vertices  are  assumed  to  be  unmarked  initially. 
Algorithm  corners  also  maintains  a  list  for  every  copy  of  every  edge,  to 
store  the  sequence  of  edges  that  may  have  been  eliminated  by  shorting. 
Each  list  is  assumed  to  contain  only  the  edge  itself  initially. 

algorithm  corners(G) 

input:  a  three-edge-connected  series-parallel  multigraph  G  containing  an 
immersed  K4 

output:  the  four  corners  of  a  K4  model  in  G 
begin 

for  each  vertex  with  only  one  neighbor 

delete  all  but  three  copies  of  its  incident  edge 
if  G  has  a  cut  point  v  of  degree  7  or  more 
then  a,  v ,  w,  x  —  spanning-corners(G,  *;) 
else  begin 

prune  each  biconnected  component  of  G 

C  :=  a  biconnected  component  with  four  or  more  vertices  of  which 
at  least  one  has  degree  5  or  more 
u ,  v,  w,  x :  =  biconnected-corners(C) 
end 

output  u,  U ,  W ,  X 
end 

We  address  the  correctness  of  corners.  Suppose  G  contains  a  cut  point 
v  of  degree  at  least  7,  after  redundant  edges  incident  on  vertices  with  only 
neighbor  are  deleted.  In  this  case,  we  use  algorithm  spanning-corners  to 
locate  the  corners. 

algorithm  spanning-corners(G,  o) 

input:  a  three-edge-connected  series-parallel  multigraph  G  in  which  each 
vertex  with  exactly  one  neighbor  has  degree  3  and  a  cut  point  u  with 
degree  7  or  more 
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output:  the  four  corners  of  a  I<4  model  in  G 
begin 

if  G  —  {u}  has  three  or  more  connected  components 
then  /*  Case  la*/ 

u,  w,  x  neighbors  of  u  in  G,  each  in  a  different  connected 
component  of  G  -  {u} 
else  begin  /*  Case  lb  */ 

C,,C2  :=  the  connected  components  of  G  -  {t>} 

A ,  ;=  C [  augmented  with  u  and  the  edges  it  induces 
A 2  -=  C2  augmented  with  v  and  the  edges  it  induces 
if  v  has  degree  4  or  more  in  A  , 
then  A  —  A{  and  B  “  A2 
else  A  ’—A2  and  B  ’•—Al 

while  v  induces  no  edge  of  multiplicity  2  or  more  in  A 
if  there  is  a  vertex  in  A  with  only  one  neighbor 
then  delete  this  vertex  and  its  incident  edges 
else  short  some  vertex  in  A  with  only  two  neighbors 
it  :=  some  vertex  in  A  such  that  uv  has  multiplicity  at  least  2 
if  v  has  no  neighbor  other  than  u  in  A 

then  w  :=  any  neighbor  of  u  in  A  other  than  v 
else  w  —  any  neighbor  of  u  in  A  other  than  u 
x  •=  any  neighbor  of  u  in  B 
end 

output  a,  o,  \\\  x 
end 

If  G  —  {*;}  has  three  or  more  connected  components  (Case  la),  it 
follows  from  the  thrcc-cdgc-conncctivity  of  G  that  a  model  of  the  star 
graph  shown  in  Fig.  1  exists  in  G,  with  v  playing  the  role  of  the  central 
vertex.  Any  three  vertices  in  G  —  [v]  can  serve  as  the  remaining  corners, 
as  long  as  no  two  of  them  are  in  the  same  connected  component  of 
G  -  {<;}.  Thus  the  vertices  returned  are  the  corners  of  a  K4  model. 

If  G  -  {n}  has  only  two  components  (Case  lb),  then  we  apply  Lemmas 
13  and  14.  If  u  has  only  one  neighbor  u  in  the  augmented  component  A 
(see  algorithm  spanning-corners),  then  by  Lemma  14,  we  may  choose  l>, 
u}  and  an  arbitrary  neighbor  w  of  u  as  the  corners  of  an  M  model.  If  v 
has  two  or  more  neighbors,  then  the  corners  of  M  can  be  found  using 
Lemma  13  as  long  as  an  edge  of  multiplicity  2  or  more  is  incident  on  v  in 
A  (note  that  it  cannot  be  a  cut  point  in  A).  As  observed  in  the  proof  of 
Lemma  15,  if  this  condition  is  not  initially  satisfied,  it  can  easily  be  forced 
by  deleting  and  shorting  vertices.  The  corners  of  the  M  model  serve  as 
three  of  the  four  corners  of  a  K4  model.  The  vertex  x  chosen  as  the 
fourth  corner  belongs  to  the  connected  component  of  G  —  [v]  that  does 
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not  contain  the  first  three  corners.  By  the  three-cdgc-connectivity  of  G,  a 
model  of  the  triple-edge  exists  in  G,  with  v  and  x  as  corners.  Combining 
this  model  with  the  M  model,  we  obtain  a  model  of  the  graph  shown  in 
Fig.  8b.  Lifting  two  pairs  of  edges  in  this  graph  gives  us  K4. 

If  G  contains  no  cut  point  of  degree  7  or  more,  then  biconnected-cor- 
ners  is  invoked  on  some  pruned  biconnected  component  of  G  that 
contains  at  least  one  vertex  of  degree  5  or  more.  Lemma  24  implies  that 
such  a  biconnected  component  exists,  and  moreover,  that  it  contains  an 
immersed  K4.  This  component  must  contain  a  vertex  with  exactly  two 
neighbors  (Observation  2).  In  this  event,  we  employ  Lemmas  22  and  23, 
plus  Lemma  25,  which  follows. 

algorithm  biconnected-corners(G) 

input:  a  three-edge-connected  biconnected  pruned  series-parallel  multi¬ 
graph  G  with  at  least  four  vertices  and  with  at  least  one  vertex  of 
degree  4  or  more 

output:  the  four  corners  of  a  K4  model  in  G 
begin 

while  x  has  not  been  assigned  a  value  do 
begin 

v  ==  an  unmarked  vertex  with  exactly  two  neighbors 
u,w  the  neighbors  of  v,  with  the  multiplicity  of  no  at  least  that 
of  vw 

if  u  has  degree  at  least  5,  or  v  has  degree  4  and  uw  exists 
then  /*  Case  2a-*/ 
x  —  any  neighbor  of  u  besides  v  or  w 
else  if  u  or  u  has  degree  4 
then  short  u 
else  if  uw  exists 

then  /  *  Case  2b  */  • 
x  any  neighbor  of  u  other  than  u  or  w 
else  if  there  is  an  edge  ua ,  a  ^  v,  of  multiplicity  2  or  more 
then  /  *  Case  2c  */ 
x  —  a 

else  if  there  are  two  vertices  of  degree  5  or  more 
then  short  u  else  mark  v 

end 

output:  u,  v ,  w ,  x 

end 

9 

Lemma  25.  In  a  candidate  graph,  G,  suppose  vertex  v  has  exactly  two 
neighbors ,  u  and  w,  and  suppose  uv  has  multiplicity  2,  vw  has  multiplicity  1, 
and  u  has  degree  at  least  5.  Let  x  denote  a  neighbor  of  it  other  than  v  or  w.  If 
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wc  has  multiplicity  at  least  2  or  uw  exists ,  then  there  is  a  K4  model  in  G  with  • 
u ,  v ,  w,  and  x  as  corners . 

/Vcw/  In  G',  the  graph  resulting  from  shorting  u,  u  has  degree  at  least 
4,  and  either  ux  has  multiplicity  at  least  2  or  mv  now  does.  Then  by 
Lemma  13,  there  is  an  M  model  in  G'  with  corners  u,  w,  and  x,  and  with 
u  the  image  of  A/’s  degree-4  vertex.  Thus  the  graph  in  Fig.  lib,  which 
contains  the  desired  K4  model,  is  immersed  in  G.  I 

Let  u  be  defined  as  an  algorithm  biconnected-corners.  The  corners  of 
an  immersed  K4  can  be  found  if  one  of  the  following  conditions  holds: 

•  either  v  has  degree  at  least  5  or  v  has  degree  4  and  edge  uw  exists 
(Case  2a).  Lemma  22  applies  in  the  former  situation,  and  Lemma  23  in  the 
latter  situation. 

•  v  has  degree  3,  u  has  degree  5  or  more,  and  either  edge  uw  exists 
or  there  is  an  edge  uay  a  v,  of  multiplicity  2  or  more  (Cases  2b  and  c). 
Lemma  25  applies. 

If  an  immersed  K4  cannot  yet  be  identified,  then  a  vertex,  u ,  with 
exactly  two  neighbors  is  shorted  as  long  as  the  resulting  graph  retains  at 
least  one  vertex  of  degree  at  least  5.  Accordingly,  if  .one  of  v's  neighbors, 
ut  is  the  only  vertex  of  degree  at  least  5,  uv  has  multiplicity  2,  and  all  other 
edges  incident  on  u  are  simple,  then  v  cannot  be  shorted.  It  suffices  in  this 
case  to  mark  v  as  having  been  visited,  because  at  most  one  vertex  can  be 
so  marked  and  another  candidate  for  v  is  always  available.  Finally,  we 
note  that  continued  shorting  will  never  result  in  a  graph  with  fewer  than 
four  vertices.  This  is  because,  as  observed  in  the  proof  of  Lemma  24,  one 
of  the  two  graphs  shown  in  Fig.  13  must  be  a  subgraph  of  any  candidate 
graph  that  contains  exactly  four  vertices  of  which  at  least  one  has  degree  5 
or  more.  Because  one  of  Cases  2a-c  apply  to  any  vertex  with  exactly  two 
neighbors  in  these  two  graphs,  corners  are  identified  at  this  point  without 
any  vertex  being  marked  or  shorted.  Of  course,  it  is  possible  that  corners 
are  found  before  the  graph  is  reduced  to  four  vertices. 

Biconnccled  components  and  cut  points  can  be  found  (using  a  depth-first 
search)  and  vertices  pruned,  in  linear  time.  It  takes  only  linear  time  to 
check  whether  a  biconnected  component  contains  a  vertex  of  degree  5  or 
more.  In  each  iteration  of  spanning-corners,  some  vertex  with  two  or 
fewer  neighbors  is  deleted  or  shorted.  In  each  iteration  of  biconnected- 
corners,  some  vertex  with  exactly  two  neighbors  is  shorted  or  marked. 
Vertices  with  two  or  fewer  neighbors  can  be  maintained  using  a  queue. 
Deleting,  shorting,  or  marking  such  a  vertex,  and  updating  the  queue  and 
the  appropriate  edge  list  require  only  a  constant  number  of  steps.  Thus 
both  spanning-corners  and  biconnected-corners  run  in  linear  time,  and 
hence,  so  docs  corners. 
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4.2.  Algorithm  paths 

Algorithm  paths  uses  the  property  that  k  edge-disjoint  paths  exist 
between  a  pair  of  vertices  if  and  only  if  a  network  flow  of  value  k  is 
possible  between  them. 

algorithm  paths(G,  s,  tu . ..,  tk) 

input:  a  multigraph  G  and  distinguished  vertices  s9tl9...,tk 
output:  edge-disjoint  paths  ply...tpk,  with  pt  connecting  s  to  ti7  if  such 
paths  exist 

begin 

G'  :=  the  edge-weighted  digraph  obtained  by  replacing  each  edge  uu  of 
multiplicity  m  with  the  directed  edges  (w,  v)  and  (u,  it),  each  of 
capacity  m 

add  to  G'  a  vertex  t  and  the  edges  0,, /),...  ,0*, /),  each  of  capacity  1 
find  a  flow  of  value  k  from  s  to  ty  if  such  a  flow  exists 
if  there  is  such  a  flow  then 
begin 

for  each  edge  (it,  v)  in  G  do 
if  both  ( u ,  v)  and  (u,  u)  have  positive  flow  values 
then  flow((u,v))  —  max{0,  flow((u,  v))  -  flow((v,  u))}  and 
flow((v,  it))  —  max{0 ,flow((u,  u))  -  flow((it,  u))} 
discard  from  G'  any  edge  without  a  positive  flow 
for  i  =  1  to  k  do 
begin 

p\  a  path  in-  G  from  s  to  ti 
pt  ;=  the  corresponding  path  in  G 
decrement  in  G  the  flow  along  each  edge  in  p\  by  one 
delete  from  G  one  copy  of  each  edge  in  /?, 
end 

output  p},...,pk 
end 

end 

We  address  the  correctness  and  use  of  paths.  In  the  following  figures, 
paths  that  are  mere  edges  are  shown  as  solid  lines.  These  edges  are 
temporarily  deleted  so  that  paths  can  be  employed  to  find  additional  paths 
with  multiple  edges,  depicted  with  dashed  lines.  We  consider  various  cases. 

Case  1  a.  The  K4  model  spans  a  cut  point  v,  and  G  —  {l>}  has  three  or 
more  connected  components. 

Corners  u,  w,  and  x  are  in  different  connected  components  of  G  —  {/;}, 
and  each  corner  is  adjacent  to  v  in  G.  Two  paths  need  to  be  found 
between  each  corner  and  v,  to  complete  a  model  of  the  star  graph  in  Fig.  1 
and  hence  a  K4  model.  So  three  calls  are  made  to  paths,  each  with  u 
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x 


FIG.  14.  Paths  to  be  found  if  G  -  {u}  has  three  or  more  connected  components. 


playing  the  role  of  5  and  k  set  to  2.  See  Fig.  14.  The  first  call  uses 
u  =  /,  =  t2\  the  second  call  uses  w  =  f,  =  t2\  the  third  call  uses  x  =  tx  =  t2. 

Case  \  b.  The  K4  model  spans  a  cut  point  u,  and  G  —  {i;}  has  only  two 
connected  components. 

See  Fig.  15.  Figure  15a  depicts  the  case  when  the  biconnected  compo¬ 
nent  of  G  containing  both  u  and  u  has  at  least  one  corner.  In  this  case, 
corners  u,  w ,  and  x  are  all  adjacent  to  v.  Moreover,  edge  uu  has 
multiplicity  2  or  more.  Note  that  u  separates  x  from  u  and  w.  To  complete 
a  model  of  the  graph  shown  in  Fig.  8b,  two  paths  between  u  and  x  in  the 
biconnected  component  containing  v  and  x,  and  two  paths  between  w  and 
[u,  u}  in  the  biconnected  component  containing  u ,  o ,  and  w  must  be  found. 
Thus  two  calls  to  paths  are  required.  The  first  call  uses  w  =  s ,  u  =  tu  and 
u  =  t2\  the  second  call  uses  u  =  s  and  w  —  tx  =  t2.  Figure  15b  depicts  the 
case  when  it  and  u  are  in  a  biconnected  component  by  themselves.  Corner 
x  is  adjacent  to  u  and  corner  w  is  adjacent  to  u .  Again,  two  calls  to  paths 
are  required.  The  first  call  uses  u  =  s,  w  —  t{  =  t2\  the  second  call  uses 
v  —  s  and  x  —  tx  =  t2. 


FIG.  15.  Paths  to  be  found  if  G  —  {(;}  has  only  two  connected  components. 
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Cases  2 a~c.  The  KA  model  is  in  a  single  biconnectcd  component.  Sec 
Fig.  16.  Case  2a  of  algorithm  corners  is  illustrated  in  Figs.  16a  and  b. 
Cases  2b  and  c  are  illustrated  in  Figs.  16c  and  d,  respectively.  In  each  case, 
one  call  to  paths,  with  k  set  to  two,  suffices. 

Recall  that  the  input  to  paths  has  at  most  a  linear  number  of  edges  and 
no  more  than  four  copies  of  any  edge.  Thus  it  takes  only  linear  time  to 
construct  G '  and  to  read  off  paths  (using,  for  example,  a  breadth-first 
search)  after  a  flow  of  value  k  has  been  found.  The  running  time  of  paths 
is  therefore  dominated  by  the  algorithm  for  finding  network  flows.  So  we 
employ  a  flow  method  such  as  Ford- Fulkerson  [6],  which  runs  in  linear 
time  as  long  as  k  is  bounded  by  an  integer  constant  and  all  edge-capacities 
are  integers,  as  is  the  case  here. 

In  summary,  to  lind  a  K4  model  we  invoke  corners  once  and  paths  at 
most  three  times.  Because  both  corners  and  paths  are  linear-time  algo¬ 
rithms,  the  entire  model-finding  process  is  accomplished  in  linear  time. 

Theorem  3.  If  K4  is  immersed  in  an  arbitrary  series-parallel  graph , 
algorithms  corners  and  paths  correctly  isolate  a  K4  model  in  linear  time. 


5.  DISCUSSION 

We  have  presented  linear-time  methods  to  detect  if  K4  is  immersed  in 
an  input  graph,  and  to  isolate  a  K4  model  if  any  exist.  We  implemented 
our  algorithms  in  C  and  we  ran  them  on  a  SUN  SPARCstation  20. 
Experiments  on  randomly  generated  graphs  indicate  that  our  algorithms 
arc  practical,  taking  only  seconds  to  process  graphs  with  thousands  of 
,  vertices.  The  running  time  of  the  detection  algorithm  is  affected  mainly  by 
the  size  of  the  input  graph.  One  might  suspect  that  the  distribution  of 
edges  over  vertices  might  also  have  an  effect,  but  we  sampled  several 
edge-probability  distributions  and  wc  could  find  no  noticeable  differences. 
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On  the  other  hand,  the  model-finding  algorithm  does  appear  to  take 
slightly  longer  on  graphs  in  which  we  have  forced  corners  to  be  connected 
only  by  long  paths.  Even  on  such  contrived  instances,  finding  a  model 
takes  no  more  than  twice  the  average  time  for  random  graphs  of  similar 
size. 

We  note  that  the  detection  algorithm  can  be  efficiently  parallelized. 
Biconnected  components  and  cut  points  can  be  found  in  0(log  n)  time  on 
a  CRCW  PRAM  with  0((m  +  /i)a(m,  n)/ log  n)  processors  [10],  where 
a{m ,  n)  denotes  the  inverse  of  Ackermann’s  function.  Deciding  whether  a 
graph  is  series-parallel  can  be  done  in  0(log2  n  4-  log  m)  time  with  0(m 
4-  n)  processors  [11].  Recalling  that  graphs  of  interest  have  at  most  a 
linear  number  of  edges,  a  parallel  version  of  decompose  thus  needs  at 
most  0(Iog2  n)  time  with  0(/i)  processors.  The  triconnected  components 
algorithm  of  [10],  modified  slightly  to  find  three-edge-connected  compo¬ 
nents  [18],  yields  a  parallel  version  of  components  that  runs  in  OOog  n) 
time  with  0(n  log  log  /i/log  n)  processors.  It  is  straightforward  to  paral¬ 
lelize  test  so  that  it  takes  constant  time  with  O(n)  processors.  Besides 
finding  cut  points,  which  are  available  from  components,  the  only  opera¬ 
tions  in  test  are  pruning  vertices  with  just  two  neighbors  and  deleting 
edges  incident  on  vertices  with  only  one  neighbor.  Both  can  be  accom¬ 
plished  in  constant  time  with  O(n)  processors.  Thus  it  is  possible  to 
determine  whether  a  graph  has  an  immersed  K4  in  0(log2  n )  time  with 
O(n)  processors  on  the  CRCW  PRAM  model.  We  did  not  implement  this 
scheme  because  many  of  the  algorithms  mentioned  are  highly  impractical. 
The  problem  of  devising  an  efficient  parallel  model-finding  method  re¬ 
mains  open. 

Fast  immersion  tests  are  of  interest  in  their  own  right.  In  practice,  they 
also  have  potential  as  indicators  of  graph  width  metrics.  To  illustrate,  we 
return  to  the  cutwidth  problem,  which  has  appeared  in  a  wide  variety  of 
VLSI  applications  (see,  as  examples,  [7,  12]).  Deciding  whether  a  graph  has 
small  cutwidth  is  an  important  part  of  many  layout  processes.  Graphs 
representing  circuits  are  frequently  series-parallel.  More  generally,  they 
tend  to  be  sparse,  with  at  most  a  linear  number  of  edges,  and  of  bounded 
degree  due  to  limitations  on  porting  and  fan-in/out.  Integer  weights  are 
used  ’to  model  multiple  edges  in  these  applications,  just  as  we  have  used 
them  here.  The  presence  of  an  immersed  K4  in  such  a  graph  guarantees 
that  it  cannot  have  cutwidth  3.  The  absence  of  K4,  however,  merely 
approximates  its  cutwidth  at  3.  In  particular,  such  an  absence  says  nothing 
at  all  about  how  to  find  a  layout  of  width  3  even  if  many  should  exist.  To 
solve  this  problem,  our  algorithms  can  be  used  in  conjunction  with  previ¬ 
ously  studied  “self-reduction”  techniques  [1,  9]  to  search  for  a  layout  in 
0(n2)  time. 
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Many  other  combinatorial  problems  may  benefit  from  fast  immersion 
tests.  For  example,  a  variety  of  load  factor  [8]  problems  can  be  decided  by 
a  finite  battery  of  immersion  tests,  including  K4 .  A  problem  indirectly 
approachable  with  this  method  is  graph  bisection.  Bounded  cutwidth  is  a 
sufficient,  but  not  a  necessary,  condition  for  bounded  bisection  width.  For 
problems  such  as  these,  there  is  interest  in  devising  fast  tests  for  other  key 
graphs  [16,  17]. 
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Recent  advances  in  well-quasi-order  theory  have  troubling  consequences  for  those 
who  would  equate  tractability  with  polynomial-time  complexity.  In  particular,  there  is  no 
guarantee  that  polynomial-time  algorithms  can  be  found  just  because  a  problem  has  been 
shown  to  be  decidable  in  polynomial  time.  We  present  techniques  for  dealing  with  this 
unusual  development.  Our  main  results  include  a  general  construction  strategy  with  which 
low-degree  polynomial-time  algorithms  can  now  be  produced  for  almost  all  of  the  catalogued 
algorithmic  applications  of  well-quasi-order  theory.  We  also  prove  that  no  such  application 
of  this  theory  can  settle  Jf  =  Jf&  nonconstructively  by  any  established  method  of  argument. 
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1.  Introduction 

Although  complexity  theory  is  formulated  in  terms  of  decision  problems,  estab¬ 
lished  techniques  of  algorithm  design  (with  rare  exception  [Mi])  constructively 
address,  instead,  corresponding  search  or  optimization  versions  of  the  problem  at 
hand.  In  the  vast  majority  of  cases  in  which  one  knows  that  an  algorithm  exists 
to  decide  a  problem  in  polynomial  time,  one  knows  precisely  what  the  promised 
algorithm  is.  Furthermore,  if  the  input  is  a  “yes”  instance,  such  an  algorithm 
uncovers  natural  evidence  (that  is,  an  answer  to  the  search  version  of  the  problem) 
as  the  basis  for  a  positive  decision. 

*  A  preliminary  version  of  this  paper  was  presented  at  the  “21st  ACM  Symposium  on  Theory  of  Com¬ 
puting,  Seattle,  Washington,  May  1989.”  This  research  is  supported  in  part  by  the  National  Science 
Foundation  under  grant  MIP-8919312,  by  the  Office  of  Naval  Research  under  contract  N00014-90-J- 
1855,  and  by  the  Natural  Sciences  and  Engineering  Research  Council  of  Canada  under  award  9820. 
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In  contrast,  advances  in  well-quasi-order  theory,  especially  the  seminal  contribu¬ 
tions  of  Robertson  and  Seymour,  provide  new  and  powerful  nonconstructive  tools 
for  establishing  polynomial-time  decidability.  These  deep  results  suffer  from  some 
challenging  difficulties: 

(1)  the  algorithms  involve  huge  constants  of  proportionality, 

(2)  the  complexity  of  associated  search  problems  is  not  established,  and 

(3)  there  is  no  general  means  for  finding  (or  even  recognizing)  correct 
algorithms. 

We  have  developed  a  number  of  general  techniques  for  dealing  with  each  of  these 
issues.  At  the  editor’s  request,  however,  we  suppress  further  discussion  of  issues  (1) 
and  (2)  here.  In  the  sequel,  we  concentrate  only  on  issue  (3). 

Relevant  background  for  this  investigation  is  reviewed  in  the  next  section.  Our 
main  results  are  presented  in  Section  3,  where  we  show  how  to  devise  constructive 
algorithms  in  the  vast  majority  of  applications.  The  result  is  that  we  can,  in  these 
cases,  know  a  low-degree  polynomial-time  algorithm  for  search  (and,  hence, 
decision)  without  ever  knowing  the  finite  list  of  graphs  on  which  the  existence  of 
the  decision  algorithm  is  based.  Moreover,  we  show  how  to  provide  asymptotic 
optimality  under  very  general  circumstances.  We  also  prove  that,  despite  the 
nonconstructive  nature  of  the  underlying  theory,  this  line  of  research  cannot  settle 
gp  =  JF0>  nonconstructively  by  any  established  method  of  argument.  A  few 
concluding  remarks  make  up  the  final  section  of  this  paper. 


2.  Background 

The  graphs  we  consider  are  finite  and  undirected,  but  may  have  loops  and  multi¬ 
ple  edges.  A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  minor  order,  written 
H^m  G ,  if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of  these  two  operations:  taking  a  subgraph  and  contracting1  an  edge.  A  family  F  of 
graphs  is  said  to  be  closed  under  the  minor  order  if  the  facts  that  G  is  in  F  and 
that  H^mG  together  imply  that  H  must  be  in  F.  The  obstruction  set  for  a 
family  F  of  graphs  is  defined  to  be  the  set  of  graphs  in  the  complement  of  F  that 
are  minimal  in  the  minor  order.  If  F  is  closed  under  the  minor  order,  it  has  the 
following  characterization:  G  is  in  F  if  and  only  if  there  exists  no  H  in  the  obstruc¬ 
tion  set  for  F  such  that  H^mG. 

A  set  along  with  a  transitive,  reflexive  relation  is  called  a  quasi-order.  For 
example,  the  class  of  all  graphs  under  is  a  quasi-order.2  A  quasi-ordered  set 
(X,  <)  is  well-quasi-ordered  if  (1)  any  subset  of  X  has  finitely  many  minimal 

1  An  edge  uv  is  contracted  by  deleting  vertices  u  and  v  and  adding  a  new  vertex  that  is  adjacent 
to  each  vertex  that  was  originally  adjacent  either  to  u  or  v. 

2  Some  authors  have  found  it  convenient  to  consider  isomorphic  graphs  as  distinct.  Under  this  view, 
the  minor  order  would  not  qualify  as  a  partial  order  because  it  would  not  be  anti-symmetric. 
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elements  and  {2)  X  contains  no  infinite  descending  chain  xx>  x2>  x3>  *•*  °f 
distinct  elements. 

Theorem  1  [RS4].  Graphs  are  well-quasi-ordered  under  the  minor  relation. 

Theorem  2  [RS3].  For  every  fixed  graph  H,  the  problem  that  takes  as  input  a 
graph  G  and  determines  whether  H^mG  is  solvable  in  polynomial  time. 

Theorem  1  is  often  called  the  Graph  Minor  Theorem.  Theorem  2  ensures 
polynomial-time  order  tests.  We  term  a  well-quasi-ordered  set  with 
polynomial-time  order  tests  a  Robertson— Seymour  set,  or  an  RS  set  for  short. 

Theorems  1  and  2  guarantee  only  the  existence  of  a  polynomial-time  decision 
algorithm  for  any  minor-closed  family  of  graphs.  It  has  been  shown  that  Theorem 
1  is  independent  of  constructive  axiomatic  systems  and,  indeed,  any  proof  of  it  must 
use  impredicative  methods  [FRS].  Also,  there  can  be  no  systematic  method  of 
computing  the  finite  obstruction  set  for  an  arbitrary  minor-closed  family  F  from  the 
description  of  a  Turing  machine  that  accepts  F  (we  prove  this  later). 

A  noteworthy  feature  of  Theorem  2  is  the  low  degree  of  the  polynomials 
bounding  the  decision  algorithms’  running  times.  Letting  n  denote  the  number  of 
vertices  in  G,  the  general  bound  is  0{n3).  If  a  minor-closed  family  excludes  a  planar 
graph,  then  it  has  bounded  tree-width  [RSI]  and  the  bound  is  reduced  to 
0(n  log  n )  [Re].  These  polynomials  possess  enormous  constants  of  proportionality, 
rendering  them  impractical  for  problems  of  any  nontrivial  size  [RS2]. 

For  an  application  of  Theorems  1  and  2,  consider  the  gate  matrix  layout  problem 
[DKL].  Although  the  general  problem  is  ./T^-complete,  it  has  been  shown  [FL1] 
that,  for  any  fixed  number  of  tracks,  an  arbitrary  instance  with  n  rows  can  be  trans¬ 
formed  into  a  graph  such  that  the  family  of  “yes”  instances  is  closed  under  the 
minor  order  and  excludes  a  planar  graph.  Thus  the  fixed-parameter  version  of  gate 
matrix  layout  can  be  decided  in  0{n  log  «)  time. 

A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  immersion  order,  written 
H^iG,  if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of  these  two  operations:  taking  a  subgraph  and  lifting3  pairs  of  adjacent  edges.  The 
relation  ^/5  like  defines  a  quasi-order  on  graphs  with  the  associated  notions 
of  closure  and  obstruction  sets. 

Theorem  3  [RSI].  Graphs  are  well-quasi-ordered  under  the  immersion  relation. 

Theorem  4  [FL3].  For  every  fixed  graph  H ,  the  problem  that  takes  as  input  a 
graph  G  and  determines  whether  H^tG  is  solvable  in  polynomial  time. 

Theorems  3  and  4,  like  Theorems  1  and  2,  guarantee  only  the  existence  of  a  poly¬ 
nomial-time  decision  algorithm  for  any  immersion-closed  family  of  graphs.  The 
method  used  to  prove  Theorem  4  yields  a  general  time  bound  of  0(nh  +  3),  where 

3  A  pair  of  adjacent  edges  uv  and  vw,  with  u  ^  v  #  w,  is  lifted  by  deleting  the  edges  uv  and  vw  and 
adding  the  edge  uw. 
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h  denotes  the  order  of  the  largest  graph  in  the  relevant  obstruction  set.  With 
excluded-minor  knowledge  specific  to  an  immersion-closed  family,  however,  the 
time  complexity  for  determining  membership  can  in  many  cases  be  reduced  to 
0{n  log  n)  by  bounding  the  family’s  tree-width. 

For  an  application  of  Theorems  3  and  4,  consider  the  min  cut  linear  arrangement 
problem  [GJ].  Although  the  general  problem  is  .yf^-complete,  it  has  been  shown 
[FL3]  that,  for  any  fixed  cutwidth,  the  family  of  “yes”  instances  is  closed  under  the 
immersion  order  and  has  bounded  tree-width.  Thus  the  fixed-parameter  version  of 
min  cut  linear  arrangement  can  decided  in  0(Nlog«)  time. 


3.  CONSTRUCTIVIZATIONS 

Decision  algorithms  based  on  finite  obstruction  sets  do  not  decide  by  producing 
(or  failing  to  produce)  natural  evidence  and  do  not  solve  associated  search 
problems.  The  situation  may  be  modeled  in  terms  of  relations.  Associated  with  a 
relation  IJ<^Z*xZ*  are  a  number  of  basic  computational  problems  that  arise  in 
the  setting  of  well-quasi-ordered  sets: 

checking — the  problem  of  determining,  for  input  (x,y\  whether  (x,  y)e77, 
decision — the  problem  of  determining,  for  input  x,  whether  there  exists  a  y  such 
that  (x,  y)ell,  and 

search — the  problem  of  computing  a  search  function  for  77,  where  such  a 
function  /:  27*  -» 27*  u  {1}  satisfies 

(1)  f(x)  —  y  implies  that  (x,  y)  is  in  77  and 

(2)  /(x)  =  1  $  Z  implies  that  there  exists  no  y  for  which  (x,  y)  is  in  77. 

Search  functions  can  often  be  computed  by  oracle  algorithms  that  employ  an 
algorithm  for  a  related  decision  problem  as  the  oracle. 

Definition.  A  self-reduction  algorithm  is  an  oracle  algorithm  that 
computes  a  search  function  for  77  with  oracle  language  domain  (77)  =  {x  |  there 
exists  a  y  for  which  (x,  y)  is  in  77}.  The  overhead  of  such  an  algorithm  is  its  time 
complexity  as  measured  by  charging  each  oracle  invocation  with  only  a  unit-time 
cost. 

Definition.  A  quasi-order  (7?,<)  is  uniformly  enumerable  if  there  is  a 
recursive  enumeration  (r0,  ru  r2,  ».)  of  R  with  the  property  that  implies  z<y. 

In  the  minor  and  immersion  orders,  for  example,  a  natural  uniform  enumeration 
is  to  generate  all  finite  graphs  based  on  a  monotonicly  nondecreasing  sequence  of 
number  of  vertices,  with  graphs  having  the  same  number  of  vertices  generated 
based  on  a  monotonicly  nondecreasing  sequence  of  number  of  edges,  with  ties 


POLYNOMIAL-TIME  ALGORITHMS 


773 


(graphs  with  the  same  number  of  vertices  and  the  same  number  of  edges)  broken 
arbitrarily. 

Definition.  Under  a  uniform  enumeration  of  the  elements  of  domain  (77),  we 
say  that  a  self-reduction  algorithm  for  77  is  uniform  if,  on  input  rj9  the  oracle  for 
domain  (77)  is  consulted  concerning  rt  only  for  i 

Definition.  An  oracle  algorithm  is  honest  if,  on  inputs  of  size  n ,  its  oracle  is 
consulted  concerning  only  instances  of  size  0(n). 

Definition.  An  oracle  algorithm  A  with  overhead  bounded  by  T(n)  is  robust 
(with  respect  to  T(n))  if  A  is  guaranteed  to  halt  within  T(n)  steps  for  any  oracle 
language. 

Despite  the  nonconstructivity  inherent  in  the  tools  discussed  in  the  previous 
section,  we  now  show  that  low-degree  polynomial-time  search  (and  hence  decision) 
algorithms  can  often  be  constructed.  The  general  technique  we  present  works  in  a 
rather  surprising  fashion:  we  are  able  to  write  down  a  correct  algorithm  without 
knowing  the  complete  relevant  obstruction  set  and,  in  some  cases,  without  knowing 
the  exact  polynomial  that  bounds  the  running  time  of  the  algorithm. 

Theorem  5.  Let  F  -  domain(77)  be  a  closed  family  in  a  uniformly  enumerable 
well-quasi-order ,  and  suppose  the  following  are  known : 

(1)  an  algorithm  that  solves  the  checking  problem  for  TI  in  OiT^n))  time , 

(2)  order  tests  that  require  0(T2{n))  time , 

(3)  a  uniform  self -reduction  algorithm  ( its  time  bound  is  immaterial ),  and 

(4)  an  honest  robust  self-reduction  algorithm  that  requires  0(T3(n))  overhead. 

Then  an  algorithm  requiring  OCmaxf^M,  T2(n)  •  r3(«)})  time  is  known  that  solves 
the  search  problem  for  77. 

Proof  Let  7  denote  an  arbitrary  input  instance  and  let  K  denote  the  known 
elements  of  the  (finite)  obstruction  set  for  F.  (Initially,  K  can  be  empty.)  We  treat 
K  as  if  it  were  the  correct  obstruction  set,  that  is,  as  if  we  did  know  a  correct 
decision  algorithm.  Since  the  elements  of  K  will  always  be  obstructions,  if  we  find 
that  77  ^  7  for  some  HeK,  then  our  algorithm  reports  “no”  and  halts.  Otherwise, 
after  checking  each  element  of  K  to  confirm  that  it  is  not  less  than  or  equal  to  7, 
we  attempt  to  self-reduce  and  check  the  solution  so  obtained.  If  the  solution  is 
correct,  then  our  algorithm  reports  “yes,”  outputs  the  solution  and  halts. 

In  general,  however,  it  may  turn  out  that  our  check  of  the  solution  reveals  that 
we  have  self-reduced  to  a  nonsolution.  This  can  only  mean  that  there  is  at  least  one 
obstruction  H  $K.  In  this  event,  we  proceed  by  generating  the  elements  of  R  in 
uniform  order  until  we  find  a  new  obstruction  (an  element  that  properly  contains 
no  other  obstruction  but  that  cannot  be  uniformly  self-reduced  to  a  solution). 
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When  such  an  obstruction  is  encountered,  we  need  only  augment  K  with  it  and 
start  over  on  I. 

Let  Cj  denote  the  cardinality  of  the  correct  obstruction  set  and  let  C2  denote  the 
largest  number  of  vertices  in  any  of  its  elements.  Then,  for  some  suitably  chosen 
function  /,  the  total  time  spent  by  this  algorithm  is  bounded  by  0(C1(Tl(ri)  + 
T2(n)-T2(n))+f(C2)).  | 

Theorem  5  prompts  several  observations. 

Observation  1.  Most  (but,  interestingly,  not  all)  of  the  known  RS  set  applica¬ 
tions  that  ensure  polynomial-time  problem  complexity  can  be  made  constructive. 
This  follows  because  polynomial-time  order  tests  are  known  for  the  minor  and 
immersion  orders  and  because  we  know,  in  most  cases,  low-degree  polynomial-time 
algorithms  for  checking  and  for  (uniform  and  honest  robust)  self-reducing. 
Reconsider,  for  example,  the  min  cut  linear  arrangement  problem.  Checking  a 
candidate  solution  is  easily  performed  in  0(n)  time.  Order  tests  are  0(n\ogn). 
Uniform  self-reduction  is  achievable  (although  it  is  rather  cumbersome).  Honest 
robust  self-reduction  can  be  performed  with  O(n)  overhead.  Therefore,  Theorem  5 
provides  the  following  constructive  corollary:  for  any  fixed  k ,  the  search  and 
decision  versions  of  min  cut  linear  arrangement  can  be  solved  in  0(n2  log  n)  time 
with  a  known  algorithm. 

Observation  2.  For  problems  such  as  knotlessness  [FL2],  for  which  no 
algorithm  (with  any  time  bound)  for  the  decision  problem  is  constructively  known, 
but  for  which  (only  super-exponential)  algorithms  for  the  checking  problem  are 
constructively  known  [Ha],  we  have  an  unexpected  situation.  Finding  a  uniform 
self-reduction,  a  task  seemingly  very  different  from  decision,  would  provide  the  first 
known  decision  algorithm  for  this  problem. 

Observation  3.  In  Theorem  5,  it  is  of  course  possible  to  replace  the  hypothesis 
that  a  uniform  self-reduction  is  known  with  the  alternate  hypothesis  that  a  decision 
algorithm  is  known.  (In  fact,  due  to  Theorem  5  itself,  it  follows  that  this  new 
hypothesis  is  potentially  weaker.)  In  some  applications,  this  may  be  more 
convenient. 

Observation  4.  Although  an  attempt  to  implement  the  algorithm  used  in  the 
proof  of  Theorem  5  appears  at  first  to  be  out  of  the  question,  closer  inspection 
reveals  that  it  may  in  fact  provide  the  basis  for  viable  and  wholly  novel  approaches 
for  solving  a  number  of  practical  problems.  This  scheme  can  be  viewed  as  a 
learning  algorithm  that  gradually  accumulates  a  useful  subset  of  the  obstruction  set, 
and  invokes  an  exhaustive  learning  component  only  when  forced  to  do  so  by  input 
that  it  cannot  handle  with  this  subset.  Furthermore,  some  obstructions  are  already 
known  or  are  easy  to  identify  for  most  problems,  so  that  we  need  not  start  with  the 
empty  set.  Growing  evidence  appears  to  indicate  that,  for  many  problems  amenable 
to  RS  set  theory,  a  relatively  small  collection  of  obstructions  is  often  enough  to 
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support  search  and  decision  algorithms  on  input  generally  encountered  in  practice 
(see,  for  example,  [LR]). 

Our  next  result  extends  an  entertaining  (but  completely  unimplementable)  idea 
first  observed  in  [Le]  and  explicitly  proved  in  [Sc]  to  the  setting  of  well-quasi- 
ordered  sets.  The  original  idea  applies  only  to  the  computation  of  search  functions 
restricted  to  domain (II).  Where  domain(77)  is  closed  in  a  well-quasi-order,  this 
restriction  can  be  lifted. 

Theorem  6.  Let  F=  domain(77)  be  a  closed  family  in  a  uniformly  enumerable 
well-quasi-order,  and  suppose  the  following  are  known : 

(1)  an  algorithm  that  solves  the  checking  problem  for  IT  in  0(T1(«))  time , 

(2)  order  tests  that  require  0{T2{n))  time ,  and 

(3)  a  uniform  self-reduction  algorithm  ( its  time  bound  is  immaterial). 

Then  an  algorithm  requiring  O(max{T0(«)+  Tfn)  -  log  T0(n),  r2(^)})  time  is  known 
that  solves  the  search  problem  for  77,  where  T0(n)  denotes  the  time  complexity  of  any 
algorithm  solving  this  search  problem. 

Proof.  Condition  (3)  ensures  that  at  least  one  search  algorithm  exists,  so  that 
T0  is  defined.  We  interleave  the  following  two  operations.  In  case  the  input  is  in  F ', 
we  employ  the  exponential  form  of  diagonalization  from  [Le],4  which  requires  time 
proportional  to  2xT0(n)  +  (X+  log  T0(n))  Tfn),  where  X  denotes  the  index  of  the 
lowest-indexed  Turing  machine  solving  the  search  problem  in  time  T0(n).  In  case 
the  input  is  in  F,  we  employ  the  uniform  enumeration  of  the  elements  of  the  order 
(along  with  obstruction  containment  tests)  as  we  did  in  the  proof  of  Theorem  5.  | 

Thus  one  theoretically  attains  asymptotic  optimality,  albeit  at  the  cost  of 
explosive  constants  of  proportionality.  A  curious  feature  of  the  preceding  theorem 
is  that  provision  is  made  neither  for  computing  the  relevant  obstruction  set  nor  for 
determining  the  function  T0.  That  is,  even  if  one  could  implement  the  algorithm,  no 
bounded  amount  of  computation  would  necessarily  reveal  when  one  has  encoun¬ 
tered  the  last  obstruction  or  when  an  optimal  algorithm  begins  to  outperform  all 
others. 

To  preface  the  final  result  of  this  section,  we  observe  that  many  members  of  the 
research  community  have  invested  considerable  effort  in  producing  ^^-complete¬ 
ness  proofs  for  a  vast  array  of  seemingly  difficult  combinatorial  problems.  Such 
proofs  implicitly  rely  on  the  assumption  that,  if  =  an  ^T^-completeness 
proof  is  not  in  vain,  but  instead  becomes  a  polynomial-time  algorithm.  There  has 
always  been  a  potential  flaw  in  this  logic,  namely  that  a  proof  of  &  =  might 
be  nonconstructive.  For  example,  what  if  chromatic  number  with,  say,  fixed  k  —  3 
were  minor-closed  (which  is  it  not)?  After  all,  traditional  methods  for  attacking  the 

4  Turing  machines  are  emulated  in  phases.  During  phase  /,  each  machine  whose  index  h  lies  in  the 
range  [1,  Q  is  emulated  until  it  has  performed  2‘~h  steps. 
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gp  =  j/'0>  question  have  to  date  failed;  one  might  expect  that  if  the  issue  is  ever  to 
be  resolved,  new  techniques  must  be  brought  to  bear.  We  now  show  that  such  a 
vexing  outcome  cannot  happen  based  on  well-quasi-ordered  sets  and  known 
problem  reduction  schemes.  Recall  that  an  RS  set  is  a  well-quasi-ordered  set  that 
supports  polynomial-time  order  sets.  We  make  the  additional  assumption  that  these 
order  tests  are  known  (as  they  are  in  the  minor  and  immersion  orders). 

Definition.  By  the  statement  is  constructively  equal  to  Jf0>  we 
mean  that  an  algorithm  is  known  that  computes,  from  the  index  and  time  bound 
of  a  nondeterministic  polynomial- time  Turing  machine  that  recognizes  a  set  X,  the 
index  of  a  deterministic  polynomial-time  Turing  machine  that  recognizes  X.  A  set 
X  is  constructively  Jf&- hard  if  a  polynomial-time  many-to-one  reduction 
from  satisfability  (SAT)  to  X  is  known. 

Every  problem  presently  known  to  be  ^T^-hard  is  constructively  Jf&- hard  as 
well,  simply  because  the  relevant  reductions  are  constructively  known  (rather  than 
only  known  to  exist).  We  now  demonstrate  that  it  is  not  possible  to  prove  &  = 
by  an  obvious  approach,  such  as  searching  for  a  known  ^^-complete  graph 
problem  that  can  be  shown  to  be  minor-closed,  except  in  a  constructive  way. 

Theorem  7.  Let  F  denote  a  closed  family  in  a  uniformly  enumerable  RS  set.  If  it 
is  constructively  JTP-hard  to  determine  membership  in  i%  then  is  constructively 
equal  to  JfFP. 

Proof  Let  i  denote  the  index  of  a  nondeterministic  Turing  machine  that  accepts 
language  L  in  time  bounded  by  polynomial  p.  We  compute  the  index  i’  of  a 
deterministic  polynomial-time  Turing  machine  that  accepts  L  by  describing  a 
polynomial-time  algorithm  for  recognizing  L.  For  input  x,  we  can  of  course  use  i 
and  p  to  compute  (by  a  known  algorithm)  in  time  polynomial  in  |x|  a  Boolean 
expression  Ex  that  is  satisfiable  if  and  only  if  xeL.  It  is  enough  to  argue  that 
knowing  a  reduction /from  SAT  to  F-membership  yields  a  known  polynomial-time 

algorithm  for  SAT.  . 

Note  that  an  honest  robust  (and  uniform)  polynomial-time  self-reduction  algo¬ 
rithm  for  SAT  is  easily  described,  by  simply  taking  the  usual  self-reduction  and 
guarding  against  a  faulty  oracle,  precomputing  the  number  of  oracle  calls  required 
when  the  oracle  is  trustworthy. 

Let  Im+  denote  the  set  of  all  graphs  that  are  images  (under  the  many-to-one 
reduction/)  of  satisfiable  expressions,  and  let  Im'  denote  the  set  of  graphs  that  are 
images  of  unsatisfiable  expressions.  Thus,  Im  +  £  F  and  Im  £  F.  Since  F  is  closed, 
Im+  is  closed  in  Im  =  Im+  ulm-,  under  the  inherited  order.  Let  0Sat  denote  the 
minimal  elements  of  Im".  The  set  0SAT  is  finite,  by  the  well-quasi-ordering  of  an 
RS  set. 

5  Observe  that  knowledge  of  a  polynomial  bound  on  acceptance  time  is  necessary;  even  the  proof  of 
Cook’s  Theorem  is  nonconstructive  without  a  known  bound. 
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Let  E  denote  an  arbitrary  Boolean  expression.  Clearly,  E  is  satisfiable  if  and  only 
if  f(E)  £  y  for  each  y  e  0SAT.  Let  Ex  ,E2>...  be  a  recursive  enumeration  of  all  Boolean 
expressions,  and  let  O  denote  the  known  candidates  for  0SAX.  (Initially,  0  can  be 
empty.)  Each  element  of  O  will  be  greater  than  or  equal  to  some  obstruction,  so  we 
treat  O  as  if  it  were  0SAX.  We  begin  by  generating  expressions  in  the  enumeration 
and  exhaustively  determining  whether  each  is  satisfiable  until  we  encounter  an  Ej 
that  is  unsatisfiable.  Resetting  O  to  be  the  minimal  elements  of  O  u  {/(£,)},  we  use 
the  known  order  tests  to  learn  whether  f(E)  contains  an  element  of  O.  If  it  does, 
then  our  algorithm  reports  “no”  and  halts.  Otherwise,  we  attempt  to  self-reduce  E 
using  f  together  with  the  order  tests,  to  implement  the  oracle.  If  we  succeed  in 
producing  a  satisfying  truth  assignment,  then  our  algorithm  reports  “yes”  and  halts. 
If  the  question  remains  unsettled  (that  is,  the  attempted  self-reduction  has  failed, 
but  there  is  no  yeO  for  which  f{E)^y\  then  we  resume  generating  expressions 
until  we  can  augment  O  and  start  over  on  E . 

Since  0SAX  is  finite,  we  are  guaranteed  to  achieve  O  =  0SAT  within  some  bounded 
initial  segment  of  the  enumeration,  although  evidence  for  a  correct  decision 
concerning  E  may  be  produced  well  before  that  point  is  reached.  The  running  time 
of  the  algorithm  is  bounded  by  a  polynomial  function  of  |x|.  | 

The  method  used  to  prove  Theorem  7  can  be  viewed  as  an  extension  of  the 
technique  employed  in  the  proof  of  Theorem  5.  It  has  recently  been  suggested  that 
there  may  be  an  alternate  proof  based  on  properties  of  sparse  sets  [CG]. 


4.  Conclusions 

Our  construction  techniques  do  not  depend  on  knowing,  nor  do  they  provide  a 
means  for  computing,  the  relevant  (finite)  obstruction  sets.  An  obvious  question 
arises:  can  these  sets  be  systematically  computed?  The  following  theorem  shows 
that,  in  a  general  sense,  they  cannot.  (We  state  this  result  for  the  minor  order  of 
finite  graphs;  the  proof  can  be  easily  modified  to  handle  other  RS  sets.) 

Theorem  8.  There  is  no  algorithm  to  compute ,  from  a  finite  description  of  a 
minor-closed  family  F  of  graphs  as  represented  by  a  Turing  machine  that  accepts 
precisely  the  graphs  in  F,  the  set  of  obstructions  for  F. 

Proof  We  reduce  from  the  Halting  Problem.  Given  a  Turing  machine  M  and 
a  word  x,  we  can  determine  whether  M  halts  on  x  as  follows.  We  modify  the 
description  of  M  to  obtain  a  description  of  a  Turing  machine  M'  that  embodies  the 
following  algorithm.  Given  as  input  a  graph  G ,  Mf  first  computes  the  index  i(G)  of 
G  in  a  recursive  enumeration  of  all  finite  graphs  that  is  uniform  with  respect  to  the 
minor  order.  M'  then  simulates  M  on  input  x  for  i(G)  steps. 

If  M  does  not  halt  in  at  most  i(G)  steps,  then  M'  accepts  G.  Otherwise,  if  M  halts 
in  exactly  i(G)  steps,  then  M*  rejects  G.  If  the  halting  time  of  M  on  input  x  is 
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t<i{G\  then  M'  determines  the  graph  H  such  that  i(H)  =  t,  and  tests  whether 
G>mH.  If  G>m  H,  then  M'  rejects  G.  Otherwise,  M'  accepts  G. 

We  claim  that  M'  accepts  a  minor-closed  family  of  graphs.  If  M  does  not  halt  on 
input  x,  then  M'  accepts  all  graphs,  which  is  trivially  minor-closed.  If  the  halting 
time  of  M  on  input  x  is  t,  then  M'  accepts  precisely  those  graphs  G  for  which 
G^mH,  a  minor-closed  family  with  the  single  obstruction  H.  To  see  this,  suppose 
G^m  H.  Since  the  enumeration  is  uniform,  i(G)  >  t  =  i(H),  and  so  G  is  rejected  by 
M'.  If  G  3s  H,  then  G  is  accepted  by  M',  regardless. 

The  description  of  M'  is  clearly  computable  from  x  and  the  description  of  M. 
Whether  M  halts  on  x  can  thus  be  determined  by  computing  the  obstruction  set  of 
the  family  of  graphs  accepted  by  M',  since  this  obstruction  set  is  empty  if  and  only 
if  M  fails  to  halt  on  input  x.  | 
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ANALYSIS  OF  A  COMPOUND  BIN  PACKING  ALGORITHM* 

DONALD  K.  FRIESENf  and  MICHAEL  A.  LANGSTON* 


Abstract.  Consider  the  classic  bin  packing  problem,  in  which  we  seek  to  pack  a  list  of  items  into  the 
minimum  number  of  unit-capacity  bins.  The  worst-case  performance  of  a  compound  bin  packing  algorithm 
that  selects  the  better  packing  produced  by  two  previously  analyzed  heuristics,  namely,  FFD  (first  fit  decreasing) 
and  B2F  (best  two  fit)  is  investigated.  FFD  and  B2F  can  asymptotically  require  as  many  as  y  and  f  times  the 
optimal  number  of  bins,  respectively.  A  new  technique,  weighting  function  averaging ,  is  introduced  to  prove 
that  our  compound  algorithm  is  superior  to  the  individual  heuristics  on  which  it  is  based,  never  using  more 
than  f  times  the  optimal  number  of  bins. 

Key  words,  bin  packing,  compound  algorithms,  heuristics,  weighting  functions,  worst-case  analysis 
AMS(MOS)  subject  classifications.  68Q20,  68Q25 

1.  Introduction.  In  the  usual  definition  of  the  bin  packing  problem,  we  seek  to  pack 
the  items  of  a  list  L  =  {h,  l2,  *  *  *  ,  In},  each  item  with  size  in  the  range  (0,1],  into  the 
minimum  number  of  unit-capacity  bins.  It  is  easily  verified  that  this  problem  is  NP-hard. 
Therefore,  we  focus  our  efforts  on  practical,  efficient  approximation  algorithms  in  hopes 
of  guaranteeing  near-optimal  results.  (Note  that  there  are  algorithms  guaranteed  to  pro¬ 
duce  results  as  close  to  the  optimum  as  desired  [1],  [7].  Unfortunately,  these  algorithms 
are  not  practical  to  implement  because  the  time  required  to  ensure  results  at  most 
( 1  +  e)  times  the  optimum  grows  extremely  rapidly  as  e  approaches  zero.) 

We  use  worst-case  analysis  as  a  measure  of  the  worth  of  a  bin  packing  heuristic. 
The  heuristic  may  not  discover  the  best  packing,  but  we  endeavor  to  show  that  it  always 
provides  results  close  to  the  optimum.  For  some  algorithm,  ALG,  let  ALG  ( L )  represent 
the  number  of  nonempty  bins  required  by  ALG  to  pack  L.  For  instance,  OPT  (L)  denotes 
the  number  of  bins  required  in  an  optimal  packing  of  L.  We  restrict  our  attention  to 
two  off-line 1  algorithms:  FFD  (first  fit  decreasing)  and  B2F  (best  two  fit).  Given  any  list 
L,  it  is  known  from  [6]  that  FFD  ( L )  does  not  exceed  (^)  OPT  (L)  4-  4,  and  from  the 
Appendix  to  this  paper  that  B2F  (L)  does  not  exceed  (f )  OPT  (L)  +  4.  Moreover,  ex¬ 
amples  exist  that  demonstrate  that  these  bounds  are  asymptotically  tight. 

It  seems  reasonable  to  suggest  that  these  two  heuristics  produce  particularly  inferior 
packings  for  rather  small,  distinct  regions  of  the  input  space.  Based  on  this  conjecture, 
we  analyze  a  compound  algorithm,  CFB,  in  which  both  FFD  and  B2F  are  applied  and 
the  better  packing  selected.  This  notion  of  combining  two  or  more  heuristics  is  an  attractive 
one,  but  the  analysis  of  such  an  algorithm  can  be  especially  difficult;  only  a  few  compound 
algorithms  have  been  successfully  analyzed  in  the  literature  (see,  for  example,  [2],  [8], 
[9]).  We  note  that  a  tight  worst-case  bound  of  71/60  has  recently  been  reported  for  a 
modification  of  the  FFD  algorithm  [  5  ] ,  thereby  yielding  the  lowest  bound  yet  published 
for  an  efficient  bin  packing  heuristic.  This  bound  is  superior  to  the  upper  bound  off  that 
we  prove  here,  but  is  inferior  to  the  lower  bound  of  227/195  provided  by  the  worst 
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1  An  off-line  algorithm  is  free  to  preview  and  rearrange  items  before  it  begins  to  pack  them. 
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examples  we  know  of  for  CFB.  Moreover,  the  novel  analysis  we  devise  for  our  compound 
algorithm  merits  attention  and  may,  we  hope,  be  applicable  in  other  settings. 

We  shall  employ  the  technique  of  “weighting”  L  so  that  the  FFD  and  B2F  packings 
can  be  compared  to  an  optimal  packing.  Although  we  would  like  to  determine  the  min¬ 
imum  of  { FFD  (L),  B2F  (L)} ,  the  analysis  involved  is  extremely  complicated.  Instead, 
we  investigate  the  average  of  {FFD  (L),  B2F  (L) } ,  in  an  effort  to  obtain  a  weak  upper 
bound  on  the  minimum.  In  particular  we  show  that,  after  eliminating  certain  cases  where 
we  can  guarantee  that  one  or  the  other  algorithm  performs  within  our  bound  off,  our 
weighting  of  L  ensures  that  the  average  and  hence  the  minimum  number  of  bins  used 
by  the  two  algorithms  is  within  the  bound. 

In  the  next  section,  we  present  some  preliminary  analysis  and  demonstrate  that 
CFB  (L)  can  be  as  great  as  (227/ 195)  OPT  (L).  We  also  introduce  a  typing  scheme  for 
the  items  of  L  based  on  size.  In  §  3,  we  establish  the  specific  conditions  required  for  the 
FFD  packing  to  use  more  than  f  the  optimal  number  of  bins.  Section  4  contains  an 
analogous  determination  for  B2F.  We  present  our  main  result  in  §  5,  proving  that  CFB  (L) 
does  not  exceed  (f )  OPT  (L)  +  8.  The  final  section  contains  remarks  about  proving  a 
tighter  performance  bound  for  CFB.  In  the  Appendix,  we  discuss  in  further  detail  the 
B2F  algorithm  and  derive  its  asymptotic  worst-case  bound. 

2.  Preliminary  discussion.  We  begin  by  describing  the  FFD  and  B2F  heuristics 
more  precisely.  The  FFD  algorithm  can  be  implemented  by  first  sorting  all  items  so  that 
their  sizes  are  arranged  in  nonincreasing  order.  Each  bin  is  packed  by  repeatedly  placing 
in  it  the  largest  unpacked  item  that  fits.  When  no  more  items  are  available  that  fit,  the 
next  bin  is  packed.  The  B2F  algorithm  modifies  this  in  the  following  way.  First  a  bin  is 
packed  as  by  the  FFD  rule.  If  the  bin  contains  more  than  a  single  item,  then  the  list  is 
checked  to  see  if  the  smallest  item  in  the  bin  could  be  replaced  by  two  items  that  would 
pack  the  bin  more  nearly  full.  If  so,  those  two  whose  sum  is  largest  are  used  in  place  of 
the  smallest  item  in  the  bin.  A  number  of  other  schemes  could  be  used  to  decide  which 
two  replace  the  smallest  item,  but  almost  any  choice  will  satisfy  our  analysis,  subject  to 
the  following  modification  made  to  simplify  the  proof:  items  of  sizes  less  than  or  equal 
to  l  will  be  held  back  until  all  larger  items  are  packed.  An  FFD-like  procedure  is  used  to 
complete  the  packing  when  only  items  of  size  no  greater  than  \  are  left.  The  purpose  of 
this  modification  is  to  reduce  the  number  of  combinations  to  consider  in  proving  an 
asymptotic  §  bound,  although  it  seems  likely  that  this  modification  actually  detracts 
somewhat  from  the  performance  of  the  compound  algorithm. 

Figure  1  depicts  the  worst  example  (independent,  of  course,  of  an  additive  constant) 
that  we  were  able  to  contrive  for  the  CFB  algorithm.  For  simplicity,  the  bin  size  has  been 
expanded  to  559.  All  of  the  examples  we  devised  that  were  even  close  to  being  this  poor 
were  dependent  on  the  small  items  being  held  back,  so  that  the  FFD  and  B2F  packings 
are  the  same. 

We  denote  the  size  of  an  item  /,  e  L  by  $(/,-).  Thus,  after  sorting,  s(h)  ^ 
s(l2)^  •  *  •  ^  s(lN).  We  use  last  to  denote  the  index  of  the  last  item  packed  by  FFD. 
Note  that  /last  may  not  be  the  smallest  item  in  L,  since  smaller  items  may  have  been 
packed  earlier  where  /iast  did  not  fit. 

To  prove  that  §  is  an  asymptotic  upper  bound  on  the  worst-case  behavior  of  CFB, 
we  now  proceed  by  contradiction  and  henceforth  assume  that  L  denotes  a  counterexample. 
That  is,  we  assume  that  both  FFD  ( L )  and  B2F  (L)  exceed  (§)  OPT  (L)  -F  8.  Without 
loss  of  generality,  we  also  assume  that  L  is  minimal.  By  this  we  mean  that  no  counter¬ 
example  exists  with  which  OPT  can  use  fewer  bins,  and  that  no  counterexample  is  possible 
with  fewer  items  for  this  minimal  number  of  bins.  (Of  course,  minimality  for  CFB  does 
not  imply  minimality  for  either  FFD  or  B2F  alone.) 
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(a)  CFB(L)  =  227* 


IS*  bins 


(b)  OPT(I)  =  195* 

FIG.  1.  Example  for  which  CFB  (L)  =  (227/ 195)  OPT  (L),  using  bin  size  559. 

An  immediate  consequence  of  this  is  that  L  contains  no  item  whose  size  is  less  than 
or  equal  to  If  it  did,  then  minimality  requires  that  one  or  more  such  items  must  be 
packed  in  the  last  bin  by  either  the  FFD  or  the  B2F  algorithm,  in  which  case  all  preceding 
bins  would  be  packed  to  a  level  of  at  least  §.  A  simple  “conservation  of  size”  argument 
ensures  that,  for  such  a  list,  no  packing  could  use  fewer  than  (|)(CFB  (L)  —  1 )  bins. 

With  this  in  mind,  we  let  s(l iast)  =  i  +  A,  for  some  A  >  0.  Since  no  item  has  size 
less  than  or  equal  to  £,  we  know  that  no  bin  in  any  packing  of  L  has  more  than 
five  items. 

We  use  the  notation  B*  for  an  arbitrary  bin  of  the  optimal  packing,  and  |  B*  |  to 
denote  the  number  of  items  B*  contains.  For  the  bins  of  the  FFD  or  B2F  packing,  we 
use  Bi,  B2,  *  *  •  as  the  sequence  of  bins  in  the  order  in  which  they  are  packed. 

Lemma  2.1.  L  contains  no  item  U  with  s(U)  ^  § . 

Proof  “  To  obtain  the  proof,  assume  otherwise.  In  both  the  FFD  and  B2F  packings, 
the  largest  item  l\  is  packed  in  Bx  with  at  most  one  other  item,  the  largest  that  would  fit. 
The  optimal  bin  containing  can  contain  at  most  one  additional  item  and  in  fact  can 
be  packed  no  better  than  Bx.  If  the  item  or  items  of  B{  are  removed  from  L,  then  all 
three  of  FFD  (L),  B2F  (L),  and  OPT  (L)  can  be  reduced  by  one,  contradicting  the 
presumed  minimality  of  L  with  respect  to  CFB.  □ 

There  can  be  no  bin  containing  only  one  item  in  the  FFD  packing  ( except,  possibly, 
for  the  last  bin).  If  there  were,  s(liast)  must  exceed  since  otherwise  /iast  would  have  fit, 
and  it  is  known  that  FFD  (L)  is  bounded  by  (l)  OPT  (L)  +  2  whenever  s(l last)  exceeds 
(See  [6,  Thm.  4.10].)  From  this  it  also  follows  that  A  must  be  less  than  or  equal 
tow- 

Each  item  of  L  is  assigned  a  type  as  shown  in  Table  1.  Although  this  typing  scheme 
is  motivated  by  the  structure  of  a  typical  packing  produced  by  the  FFD  rule  (more  will 
be  said  on  this  in  the  next  section),  we  classify  items  exclusively  by  their  size  so  that  we 
can  compare  both  FFD  ( L )  and  B2F  (L)  to  OPT  (L).  Note  that  A  cannot  exceed  ^  if 
r4  or  X5  items  exist. 

3.  A  close  look  at  FFD.  We  say  that  an  item  is  “regular”  if  there  is  no  larger  item 
available  when  it  is  packed.  A  “fallback”  item  is  one  that  is  packed  when  one  or  more 
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Table  1 

Item  types  based  on  size. 


Type 

Min  size 

Max  size 

Yx 

<1 

x2 

>T1-  A/2 

Si 

y2 

>i 

— 12  A/2 

X3 

cn 

<f 

1 

A 

<1 
=  3 

y3 

>5 

SA-A/3 

X4 

tJ- 

<f 

1 

A 

<I 
=  4 

y4 

>i 

< 

1 

VII 

X5 

si 

larger  items  are  available.  Thus  the  notation  we  have  used  in  Table  1  roughly  agrees  with 
the  way  items  are  packed  by  FFD.  That  is,  regular  items  of  type  X{  are  generally  packed 
by  FFD  in  a  bin  consisting  of  the  i  largest  items  available  when  the  bin  is  packed;  We 
call  such  a  bin  an  Xt  bin.  Regular  items  of  type  Yt  are  generally  packed  with  i  —  1  other 
Y i  items  and  a  (smaller)  fallback  item.  We  call  such  a  bin  a  Y(  bin.  (Note  that  no  Yt  bin, 
i  ^  2,  can  have  more  than  one  fallback  item,  as  the  following  argument  shows.  If  two 
fallback  items  are  used,  then  they  combine  to  fill  more  than  \  of  the  bin.  In  this  event, 
however,  the  two  or  more  regular  items  fill  less  than  \  of  the  bin,  and  the  smaller  regular 
item  has  a  size  less  than  implying  that  another  regular  item  would  have  fit  in  the  bin 
as  well.) 

This  motivates  the  range  of  sizes  we  have  selected  for  each  item  type.  For  example, 
the  sum  of  the  sizes  of  the  two  items  in  an  X2  bin  must  exceed  1  —  (^  +  A),  or  else  /iast 
would  have  been  used  as  a  fallback  item  in  that  bin.  Hence,  with  the  exception  of  items 
from  the  first  or  last  X2  bin,  every  regular  X2  item  must  have  a  size  in  the  range 
(■^  —  A/2, 5] .  Similar  size  restrictions  are  used  to  define  the  other  item  types  as  summarized 
in  Table  1.  We  use  these  same  size  ranges  to  assign  a  type  to  each  fallback  item. 

There  may  also  be  some  bins,  which  we  define  as  “exceptional”  for  the  FFD  packing, 
that  are  not  packed  by  FFD  with  items  of  the  expected  sizes.  These  can  only  be  the  first 
or  last  bins  of  a  particular  type,  subject  to  the  following  constraints.  If  the  last  bin  of  type 
Y,  is  exceptional  (that  is,  it  does  not  contain  i  items  of  type  Y}),  then  the  next  bin  is  an 
Xi+ 1  bin  that  is  not  exceptional  if  there  are  at  least  two  Xi+  \  bins.  Similarly,  if  the  last 
bin  of  type  X,  is  exceptional,  then  the  first  bin  of  type  Yt  is  not  exceptional  unless  it  is 
also  the  last  Y,  bin. 

Consequently,  there  are  at  most  eight  exceptional  bins  in  the  FFD  packing,  including 
the  last  bin  packed  (which  contains  /]ast).  We  define  an  exceptional  item  to  be  one  packed 
in  an  exceptional  bin  or  one  smaller  than  /Iast. 

We  now  seek  to  determine  the  precise  conditions  necessary  for  FFD  (L)  to  exceed 
(§)  OPT  (L)  +  8.  In  this  effort,  we  employ  a  weighting  function  wF:  L  ->  R+. 
We  extend  w  to  subsets  of  L  in  the  obvious  fashion.  For  example,  wf(Bj)  denotes 
2 /,.  e  Bj  Wir(/,).  Our  intent  is  to  assign  each  item  as  small  a  weight  as  possible  and  yet 
ensure  that  the  weight  of  any  nonexceptional  FFD  packed  bin  is  at  least  1.  Table  2 
describes  our  definition  of  wF  for  nonexceptional  items. 

Recall  that  fallback  items,  like  regular  items,  are  assigned  a  type  based  on  their  size. 
We  deviate  slightly  from  this  definition  of  wF  for  items  packed  in  Yj  bins.  Consider  any 
two  Yx  items  a  and  b ,  where  a  precedes  b.  Since  5(a)  ^  s(b),  we  increase  w(a),  if 
necessary,  to  ensure  that  w(a)  ^  w(b)  and  reduce  the  weight  of  any  item(s)  packed  with 
a  accordingly.  For  future  reference,  we  state  this  formally  as  follows: 


ANALYSIS  OF  A  COMPOUND  BIN  PACKING  ALGORITHM 


65 


Table  2 

Weighting  function  wF  based  on  FFD  packing. 


Type  of  nonexceptional 
items  in  an  FFD-packed  bin 

Weights  assigned 

Yu  any  two  items 

3  i  1 

5»  5 »  5 

YuX2 

3  2 

5>  5 

Yu  Y2 

l,  \  if  3Y2  bin(s),  else 

11  4 

15»  15 

YuX 3 

2  1 

3>  5 

Yu  Y3 

11  4 

15>  15 

YlfX4 

3  i 

4*  4 

Yu  Y4  or  smaller  item 

4  1 

5»  5 

*2,  *2 

1  1 

25  2 

Y2,Y2yX3 

11  11  4 

30?  305  15 

Y2,  Y2,  Y3  or  smaller  item 

2  2  1 

55  55  5 

X3 ,  X3 ,  X3 

111 

35  35  3 

Y3,  Y3i  Y3,  any  item 

4  4_  ±  I 

155  155  155  5 

X4,X4,X4,X4 

l  1  i  i 

45  45  45  4 

any  five  items 

inn 

55  55  55  5s  5 

Yx  weighting  rule :  If  a  and  b  are  Yx  items,  and  a  is  packed  in  a  bin  before  the  bin 
containing  b,  then  wF(a)  ^  wF(b). 

An  exceptional  item  receives  a  weight  of  zero,  completing  our  definition  of  wF .  For 
the  convenience  of  the  reader,  Table  3  provides  a  listing  of  the  possible  weights  for  each 
nonexceptional  item  type. 

Lemma  3.1.  The  FFD  weight  of  an  optimal  bin  cannot  exceed  §  unless  the  bin 
contains  a  Yx  item  or  a  Y2  item  whose  FFD  weight  exceeds  5. 

Proof.  Suppose  that  2?*  is  a  bin  of  the  optimal  packing  that  has  weight  greater  than 
§  and  2?*  contains  neither  of  the  items  mentioned  in  the  statement  of  the  lemma.  Clearly 
B*  must  contain  at  least  3  items. 

Case  1.  Suppose  |  B*  |  =3.  Then  at  least  one  item  must  have  weight  greater  than 
j  and,  from  the  assumptions  of  the  lemma,  it  can  only  have  type  X2.  There  cannot  be 
two  such  items,  or  else  no  item  larger  than  /last  could  fit  with  them.  Thus  wF(B*)  ^  \  + 

liX-1 
3  T-  3  “  6* 


Table  3 

Possible  FFD  weights  for  each 
nonexceptional  item  type. 


Type 

Weight 

Yx 

4  3  11  2  3 

5s  45  15s  35  5 

x2 

1  2 

2s  5 

y2 

2  ii  1  ± 

55  305  35  15 

x3 

1  4  1 

3s  155  5 

Y$ 

4  1 

15s  5 

X4 

1  1 

45  5 

X5  or  Y4 

1 

5 
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Case  2.  Suppose  |  i?*  |  =  4.  If  B*  does  not  contain  an  X2  item,  then  the  smallest 
item  packed  must  have  weight  exceeding  j,  else  wF(B* )  ^  3(5)  +  5  =  5.  No  item  larger 
than  /last  can  be  packed  with  three  items  of  type  X3,  and  there  cannot  be  four  items 
greater  than  \  in  size.  Thus  there  must  be  one  item  of  weight  at  most  \  and  one  other 
item  of  type  Y3  or  smaller,  and  wF(£*)  is  at  most  i  +  5  +  T5  +  4<5-  Consequently,  B* 
must  contain  an  X2  item.  The  second  largest  item  of  B*  must  be  of  type  Y3  or  X4, 
implying  that  the  remaining  two  items  either  ( 1 )  are  each  of  type  Y4  or  less  or  ( 2 )  contain 
an  item  smaller  than  /iast.  In  either  case,  wF(B*)  <  f . 

Case  3.  Suppose  |  B*  |  =  5.  B*  must  contain  an  X3  or  Y3  item  since  it  can  contain 
neither  a  Y2  item  nor  four  X4  items  and  any  item  as  large  as  /iast.  Therefore,  the  second 
largest  item  of  B*  must  be  of  type  X4,  implying  that  the  remaining  three  items  either 
( 1 )  are  each  of  type  Y4  or  less  or  (2)  contain  an  item  smaller  than  /last.  In  either  case, 
□ 

Lemma  3.2.  The  FED  packing  of  L  contains  no  Y2  bin . 

Proof.  Suppose  there  is  a  Y2  bin.  Consider  the  sorted  sublist  L'  obtained  from  L  by 
deleting  every  item  that  is  smaller  than  \  +  A,  every  item  that  is  larger  than  \  -  2A,  and 
every  item  that  is  placed  in  a  bin  with  an  item  larger  than  3  —  2A  in  the  FFD  packing  of 
L.  Clearly,  the  FFD  packing  of  L!  must  also  have  a  Y2  bin.  Moreover,  since  FFD  (L)  > 
(§)  OPT  (L)  +  8,  it  follows  that  FFD  (Z/)  >  (f)  OPT  (I/)  +  8.  (Deleting  items  smaller 
than  £  +  A  does  not  affect  the  number  of  bins  used  by  FFD  and  cannot  increase  the 
number  required  by  OPT.  After  that,  as  long  as  the  first  item  of  the  list  is  larger  than 
§  -  2  A,  it  and  any  other  item  FFD  packs  in  Bx  can  be  deleted,  reducing  the  number  of 
bins  used  by  FFD  by  one  and  the  number  needed  by  OPT  by  at  least  one.)  Thus,  from 
these  observations  and  the  last  lemma,  it  suffices  to  restrict  our  attention  to  L'  and  an 
optimal  bin  B  *  that  contains  z,  a  Yx  item  or  a  Y2  item  whose  FFD  weight  exceeds  3, 
and  show  that,  due  to  the  presence  of  a  Y2  bin,  ^  f .  We  assume  wF{B*)  >  §  and 

consider  the  possible  cases. 

Case  1.  Suppose  z  is  a  Yx  item. 

Suppose  wF{z)>\.  Then  the  smaller  Y2  item  in  the  Y2  bin  did  not  fit  with  z  in  the 
FFD  packing.  Hence  s(z)  >  1  -  (-ft  -  A/2)  =  ft  +  A/2.  If  |  B*  |  =2,  then  the  second 
item  can  have  weight  at  most  3  and  since  the  weight  of  z  is  at  most  3,  wF(B*)  <  f .  Since 
\B*\  must  be  less  than  4,  we  must  have  |  B*  |  =  3.  If  the  second  largest  item  were 
at  least  \  in  size,  no  third  item  would  fit.  If  both  items  are  of  type  Y4  or  X5 ,  then 
wF(B*)  =  5  +  2(|)  =  f.  Thus  there  must  be  an  X4  item  in  B*  and,  moreover,  it 
must  have  weight  If  FFD  packs  this  X4  item  in  a  bin  with  subscript  less  than  that 
of  the  bin  containing  z,  then  the  Yx  weighting  rule  implies  that  its  weight  is  at  most 
1  -  wF(z)  and  we  would  get  wF(B*)  ^  f.  But  this  X4  item  would  fit  with  z,  so  the  item 
packed  with  z  by  the  FFD  algorithm  is  at  least  as  large  as  an  X4  item.  Thus  wF(z)  ^ 
|  and  wf(B*)  ^  f.  (Note  that  the  Yx  weighting  rule  cannot  cause  z  to  have  a  weight 
exceeding  |  unless  every  X4  item  has  a  weight  less  than  £.) 

Now  suppose  that  wF{z)  =  f .  Then  certainly  |  B*  \  =  3.  No  item  of  size  greater  than 
5  can  then  be  used.  If  either  of  the  other  items  had  weight  less  than  3,  then  wF(l?*)  ^ 
|  +  3  +  =  f.  However,  the  only  items  of  weight  3  have  size  greater  than  5,  and  no  two 

items  of  size  greater  than  \  could  fit  with  a  Yx  item.  We  conclude  that  z  cannot  be  a 
Yx  item. 

Case  2.  Suppose  z  is  a  Y2  item. 

Clearly  |  B*  \  =  3  or  4.  Suppose  |  B*  \  =3.  The  only  possible  problem  occurs  if  B* 
contains  an  X2  item,  a,  and  an  X3  item,  b.  In  this  event,  A  >  35,  or  else  s(B*)  >  1.  But 
then  s(z)  +  s(b)  >  5  +  js  ~  A/3  >  3  -  2A,  the  maximum  size  for  a  Yx  item.  Thus  a 
would  fit  with  any  Yx  item.  Since  it  must  be  the  case  that  wF(a)  =  3,  all  fallback  items 
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in  Y\  bins  must  be  of  type  X2.  Therefore  z  is  packed  by  FFD  into  some  Y2  bin,  Bt. 
Certainly  b  would  fit  as  the  fallback  item  in  Bi9  and  we  conclude  that  either  wF(b)  = 
or  wF(z)  =  1 1  /30.  In  either  case,  wF(B*)  ^  §. 

Suppose  |  B*  |  =4.  The  second  largest  item  of  B*  can  only  be  of  type  X3,  the  third 
only  of  type  X4.  Thus  w(B* )  <  f  unless  the  smallest  item  is  an  X4  item  as  well.  But  this 
is  impossible,  since  s(Y2)  +  s(X3)  +  2s(X4)  ^  1  implies  A  >  jq  and  s(Y2)  +  s(X3)  +  2s 
(any  item)  ^  1  implies  A  <  ^.  We  conclude  that  z  cannot  be  a  Y2  item. 

By  definition,  wF(L')  ^  FFD  (Lr)  -  8.  Lemma  3.1  and  the  analysis  just 
completed  demonstrate  that  wF(L')  ^  (§)  OPT  (£/)•  Hence  we  derive  FFD(L')  = 
(§)  OPT  ( L ')  +  8,  contradicting  the  presumed  existence  of  a  Y2  bin.  □. 

We  state  here  some  important  consequences  that  follow  from  our  analysis  of  the 
FFD  packing. 

Corollary  3.1.  If  x  is  a  Y2  item,  then  wF(x)  ^  If  B*  is  any  optimal  bin  not 

containing  a  Y{  item,  then  wF(B*)  S  f . 

Lemma  3.3.  If  B*  is  any  bin  of  the  optimal  packing  containing  an  item  of  size  less 
than  g  +  A,  then  wF(B*)  ^  1. 

Proof  Suppose  B *  contains  such  an  item,  a.  Then  certainly  |  B*  |  must  be  at  least 
3,  since  a  is  exceptional  and  therefore  wF(a)  =  0. 

Case  1.  Suppose  |  B*  |  =3.  Then  there  must  be  a  Yx  item,  b.  The  remaining  item, 
c ,  would  fit  when  b  was  packed.  If  it  is  unavailable,  then  its  weight  is  at  most  1  -  wF{b) 
by  the  Yi  weighting  rule.  If  it  is  available,  then  the  item  used  in  place  of  c  must  be  at 
least  as  large.  Since  ^(c)  <  3,  there  is  no  way  for  c  to  receive  more  weight  than  the  item 
packed  with  b  by  FFD  (see  Tables  1,2,  and  3). 

Case  2.  Suppose  |  5*  |  =4.  There  must  be  an  item  of  weight  exceeding  |  that,  by 
Lemma  3.2,  cannot  be  of  type  Y2.  Thus  it  must  be  an  X2  item.  If  each  of  the  remaining 
items  have  weight  at  most  then  the  lemma  holds  for  B*,  so  there  must  be  a  T3  or  X3 
item.  If  both  items  have  size  at  least  |  +  A,  then  s(B*)  >  X  -  a / 2  +  4  +  i  +  £  + 
A  >  1.  On  the  other  hand,  if  there  is  a  second  item  whose  size  is  less  than  £  +  A,  then  cer¬ 
tainly  wF(i?*)  ^  1. 

Case  3.  Suppose  |  B*  |  =5.  There  must  be  an  item  of  weight  exceeding  \,  or  else 
wf(B*)  ^  4(\).  It  cannot  be  larger  than  5  in  size,  so  it  must  be  of  type  X3  or  Y3.  There 
cannot  be  two  items  exceeding  \  in  size,  or  else  s(B*)  >  1.  The  remaining  three  items 
must  all  have  size  at  least  {  +  A.  If  two  are  less  than  5  in  size,  however,  ^  |  + 

{  +  2(|)  <  1.  If  two  are  to  receive  weight  however,  s(B*)  >|  +  -^-A/2+|  + 

l  +  A  >  1. 

Thus,  in  any  case,  we  conclude  that  wF(B*)  is  at  most  1  if  B*  contains  an  item 
smaller  than  /iast.  □ 

4.  A  close  look  at  B2F.  We  now  seek  to  determine  the  precise  conditions  necessary 
for  B2F  (L)  to  exceed  (f )  OPT  (L)  +  8.  In  defining  the  weighting  function  wB  for  the 
B2F  packing,  we  shall  retain  the  type  classification  described  in  §  2.  That  is,  items  are 
still  classified  strictly  according  to  size  as  listed  in  Table  1.  Most  of  our  definition  for  wB 
is  straightforward  and  is  given  in  Table  4. 

The  definition  of  wB  for  items  in  bins  is  more  complicated  and  is  described  in 
the  following  paragraphs. 

We  wish  to  maintain  the  fact  that  the  sum  of  the  weights  of  the  items  in  any  nonex- 
ceptional  bin  is  1 .  Thus  in  any  Y\  bin  with  only  one  item,  that  item  has  weight  1 .  (Unlike 
the  FFD  packing,  such  a  one-item  bin  may  exist  in  the  B2F  packing.)  We  would  also 
like  to  keep  smaller  Y\  items  from  having  greater  weight  than  larger  ones,  and  we  would 
like  the  fallback  items  to  have  their  weight  assigned  according  to  their  type.  The  difficulty 
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Table  4 

Weighting  function  wB  for  bins  not  containing  an  item 
of  size  exceeding  \  in  B2F  packing. 


Type  of  nonexceptional  items  in  a  B2F-packed  bin 

Weights  assigned 

X2  or  Y2,  X2  or  Yz 

1  1 

2»  2 

X2  or  Y2,  X2  or  Y2,  any  item 

2  2  I 

5»  5>  5 

X2  or  r2,  X,  or  Y,,  X,  or  Y , 

2  3  3 

5J  10 j  10 

X2  or  Y2,  X3  or  Y3,  X4  or  smaller  item 

i  J.  I 

2>  10»  5 

X2  or  Y2,  Xt  or  Y4,  Xt  or  Y< 

1  1  1 

2 J  4 

X3  or  I3,  X3  or  Y3,  X3  or  Y3 

3>  35  3 

X3  or  Y3,  X3  or  Y3,  X3  or  Y3,  X4  or  smaller  item 

4  4  4  1 

155  155  155  5 

X3  or  Y3,  X3  or  Y3,  X4  or  smaller  item,  X4  or 
smaller  item 

iiii 

105  105  55  5 

X4  or  Y4,  X4  or  Y4,  X4  or  Y4,  X4  or  Y4 

Iiii 

4>  45  45  4 

any  five  items 

11111 

55  55  55  55  5 

comes  with  small  items  (those  with  size  less  than  or  equal  to  3),  in  which  case  wB  depends 
on  the  last  such  item  packed  in  a  Yx  bin. 

Specifically,  let  h  be  the  index  of  the  last  bin  in  the  B2F  packing  containing  a  Yx 
item,  no  X2  or  Y2  item,  and  at  most  one  fallback  item.  All  subsequent  Yx  bins  contain 
either  two  fallback  items  or  one  fallback  item  of  type  Y2  or  X2 .  In  either  case,  the  Yx 
item  is  given  weight  |.  If  there  is  one  fallback  item,  it  is  given  weight  §;  if  there  are  two, 
each  is  given  weight  5. 

If  \  Bh\  =  1,  then  a  Bhr s  Yx  item  and  all  earlier  Yx  items  are  assigned  weight  1,  and 
all  earlier  fallback  items  are  assigned  weight  zero. 

If  Bh  =  {y,  x} ,  where  y  is  of  type  Yx  and  s(x)  ^  then  we  determine  the  weight 
of  x  by  examining  all  items  of  size  less  than  or  equal  to  s(x)  that  are  packed  after  the 
last  Yx  item.  That  is,  we  set  wB(x)  =  max  { wB(t)  \  s(t)  ^  s(x),  t  not  packed  in  a  Yx  bin } . 
Of  all  items  that  are  available  when  .x  is  packed  that  would  fit  (no  larger  item  would  fit), 
and  that  are  not  packed  in  Yx  bins,  we  choose  the  one  that  has  maximum  weight  (using 
Table  4).  If  there  are  no  such  items,  then  we  set  wB(x)  =  zero. 

Once  Bh  and  wB(x)  have  been  determined,  the  rest  of  wB  is  defined  as  follows.  The 
Yx  item  y  in  Bh  is  given  weight  wB(y)  =  1  —  w5(x).  Since  s(x)  =  3  and  the  maximum 
size  of  any  Yx  item  is  § ,  x  must  have  fit  in  any  preceding  bin.  Thus  each  such  bin  contains 
either  two  fallback  items,  or  one  fallback  item  at  least  as  large  asx.  All  Yx  items  preceding 
Bh  are  assigned  weight  wB{y).  If  there  are  two  fallback  items,  each  is  assigned  weight 
wB(x)/ 2;  if  there  is  only  one,  it  is  assigned  weight  wB(x). 

If  Bh  does  not  exist,  then  h  =  0  and  all  Yx  items  are  assigned  weight  f  with  their 
associated  fallback  items  given  weight  § ,  or  ^  each  if  there  are  two  of  them. 

The  example  depicted  in  Fig.  2  illustrates  the  role  of  Bh  in  determining  wB.  Types 
of  items  packed  in  each  bin  are  given  on  the  inside,  wB  is  listed  on  the  outside.  In  this 
example,  h  =  4,  and  one  of  the  X4  items  in  Bt  is  no  larger  than  the  X4  item  in  B4. 

Definition  .  The  following  bins  are  exceptional  for  the  B2F  packing :  the  last  bin 
to  contain  an  item  of  each  of  the  types  X2,  Y2,  X 3,  Y3,  X4,  Y4i  the  last  bin  containing 
exactly  three  X3  or  Y3  items,  and  the  last  bin  of  the  packing. 

In  general,  therefore,  the  last  bin  containing  an  item  of  a  particular  type  is  exceptional, 
although  Yx  and  X5  items  are  excluded  from  this.  Note  that  if  an  X2  item  is  packed  with 
two  Y4  items,  there  can  be  no  X2  items  left  (since  any  X2  item  is  larger  than  any  two  Y4 
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Fig.  2.  The  role  of  Bh  in  determining  wB.  Here ,  h  =  4. 


items)  and  the  bin  is  exceptional.  If  an  X2  item  is  packed  with  an  X4  item  and  a  Y4  item, 
there  can  be  no  X4  items  left  and  the  bin  is  exceptional.  Similarly,  if  an  X2  or  Y2  item  is 
packed  with  an  X4  or  Y4  item  and  an  X5  item,  the  bin  is  exceptional  since  there  can  be 
no  more  X4  or  Y4  items  available.  A  bin  whose  largest  item  is  of  type  X3  (73)  is  configured 
as  described  in  Table  4  unless  there  are  no  more  X3  (Y3)  items  available.  Also,  a  bin 
whose  largest  item  is  of  type  X4(Y4)  is  configured  as  described  in  Table  4  unless  there 
are  no  more  X4(Y4)  items  left.  Finally,  the  last  bin  containing  three  X3  or  Y3  items  and 
nothing  else  is  classified  as  exceptional.  Although  this  bin  might  not  otherwise  qualify 
as  exceptional,  it  could  cause  problems  in  our  proof  if  its  items  were  each  to  receive 
weighty. 

We  conclude  that  there  are  at  most  eight  exceptional  bins  in  the  B2F  packing.  We 
define  an  exceptional  item  simply  as  one  packed  in  an  exceptional  bin.  Such  an  item 
receives  a  weight  of  zero,  completing  our  definition  of  wB.  For  the  convenience  of  the 
reader,  Table  5  provides  a  listing  of  the  possible  weights  for  each  nonexceptional 
item  type. 

Before  proceeding  with  the  principle  results  of  this  section,  we  first  prove  some 
preliminary  lemmas  that  reveal  details  of  the  B2F  packing.  The  first  of  these  concerns  is 
the  occurrence  of  items  of  weight  f,  the  second  the  impossibility  of  a  certain  configuration 
containing  Y3  and  Y4  items. 

Lemma  4.1.  If  there  is  an  item ,  x,  of  B2F  weight  5,  then  there  must  be  a  bin  in  the 
B2F  packing  containing  exactly  three  items ,  each  of  which  has  size  no  larger  than  s(x). 

Proof  The  only  possible  types  for  x  are  X3  and  Y3 ,  and  the  only  possible  bins  for 
x  to  be  packed  in  are  a  three-item  bin  or  one  with  an  item  of  type  Yy ,  packed  in  a  bin 
Bi,  where  i  S  h.  Suppose  x  is  packed  with  a  Yx  item.  From  the  definition  of  wB,  it  is 
clear  that  the  fallback  item  in  Bh  also  has  weight  |  and  is  no  larger  than  x.  Without  loss 
of  generality,  we  can  assume  that  x  is  the  fallback  item  in  Bh.  If  x  has  weight  5,  however, 
then  there  must  be  another  item  packed  after  the  Yy  bins  that  is  no  larger  than  x  and 


Table  5 

Possible  B2F  weights  for  nonexceptional 
items  in  a  bin  Bh  where  i  exceeds  h. 


Type 

Weight 

Y\ 

3 

5 

x2 

i  2 

2>  5 

r2 

1  2 

2>  5 

X 3 

13  4  1 

3>  10>  15 »  5 

Y 3 

i  J-  ±  i 

3>  10>  15>  5 

X4 

1  I 

4>  5 

y4 

i  i 

4»  5 

x$ 

1 

5 
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has  weight  5.  From  Table  4  we  know  that  this  item  must  be  packed  in  a  bin  containing 
exactly  three  items  each  of  weight  5.  Moreover,  we  know  from  Table  1  that  any  one  of 
these  three  items  would  have  been  used  in  place  of  x  if  it  were  larger.  Thus  we  may 
assume  that  the  lemma  holds  unless  x  is  an  X3  or  Y3  item  packed  in  a  three-item  bin. 
However,  the  last  such  bin  will  contain  the  three  smallest  such  items  (and  hence  is 
exceptional).  Thus  the  last  three-item  bin  satisfies  the  conditions  of  the  lemma.  □ 

Lemma  4.2.  If  there  is  a  Y4  item  <?/B2F  weight  J,  then  there  is  no  Y3  item  o/B2F 
weight  5. 

Proof  Suppose  there  are  such  Y3  and  Y4  items.  In  order  to  have  a  73  item  of  weight 
I,  there  must  be  a  B2F  bin,  B,  containing  three  73  items  and  nothing  else.  (A  bin  with 
three  items,  some  of  type  X3  and  some  of  type  73,  would  be  exceptional  since  it  would 
contain  the  last  X3  item,  and  hence  its  items  would  have  weight  zero.)  Of  course,  a  73 
item  can  have  weight  5  if  it  is  packed  in  a  Yx  bin,  but  even  in  this  case  there  must  be 
another  bin  containing  three  73  items  of  weight  f  If  there  were  two  74  items  available 
when  B  was  packed,  then  they  would  have  replaced  the  last  73  item  since  any  two  73 
items  and  any  two  74  items  will  always  fit  in  a  single  bin.  Thus  the  74  items  must  have 
been  packed  as  fallback  items  in  a  bin  before  B  was  packed.  The  only  way  such  a  fallback 
item  can  have  weight  \  is  if  it  is  packed  in  a  bin  containing  a  Y2  item  and  two  74  items. 
(Note  that  a  bin  containing  an  X2  item  and  two  74  items,  or  one  X4  item  and  one  74 
item,  is  exceptional.)  But  if  such  a  bin  were  to  occur  before  B,  then  two  of  the  available 
73  items  would  have  been  packed  instead  with  the  72  item.  Thus  we  cannot  have  both 
items,  as  specified  in  the  statement  of  the  lemma.  □ 

Lemma  4.3.  If  B  *  is  a  bin  of  the  optimal  packing  containing  a  Yx  item ,  then 
wb(B*)S  5. 

Proof  Assume  otherwise  for  some  bin  B*  containing  a  Yx  item,  a.  Since  a  cannot 
fit  with  three  or  more  items  in  any  bin,  we  must  have  |  B*  |  ^  3.  We  begin  by  observing 
that  if  a  has  weight  exceeding  §,  then  we  can,  without  loss  of  generality,  assume  that  a 
is  the  Yi  item  B2F  packed  in  Bh.  Otherwise,  it  would  come  from  a  bin  preceding  Bh  in 
the  B2F  packing,  and  would  consequently  be  at  least  as  large  as  the  7i  item  in  Bh.  Thus 
the  7i  item  in  Bh  would  fit  in  B*  in  place  of  a ,  and  we  may  as  well  assume  that  it  is  a. 

Case  1.  Suppose  |£*|  =2.  Then  certainly  some  item  in  B*  must  have  weight 
exceeding  §,  and  we  can  assume  that  a  is  packed  in  Bh.  Let  b  be  the  other  item  in  B*. 
Since  b  would  fit  in  Bh,  either  b  was  packed  earlier  and  thus  was  not  available,  or  the 
item  packed  with  a  in  Bh  is  at  least  as  large  as  b.  If  b  is  packed  by  B2F  in  a  Yx  bin  after 
Bh,  then  wB(b)  =  5  since  b  cannot  be  of  type  X2  or  72  (if  it  were,  the  item  packed  in  Bh 
could  not  be  of  size  5  or  less).  Thus  wb{B*)  ^  f  in  this  case.  If  b  is  packed  before  a ,  or 
if  b  is  packed  after  the  Yx  bins,  wB(b)  ^  1  -  wB(a)  and  so  wb(B*)  is  at  most  1  in  this 
case.  We  conclude  that  if  |  B*  |  =2 ,wb(B*)  cannot  exceed  f . 

Case  2.  Suppose  |  B*  |  =3.  Let  B*  =  {a,  b,  c }  with  s(b)  ^  s(c).  Then  s(b)  <  5 
and  5(c)  <  or  else  s(B*)>  1 .  Therefore,  their  weights  are  at  most  5  and  i  respectively. 
Consequently,  we  know  that  wB{a)  must  exceed  §  if  the  lemma  is  to  fail.  Thus  we  can 
assume  that  a  is  the  Yx  item  in  Bh.  We  now  employ  the  same  argument  that  we  used  in 
Case  1  to  prove  that  the  sum  of  the  weights  of  a  and  either  b  or  c  can  be  at  most  1.  If 
both  were  available,  then  Bh  would  use  two  fallback  items,  so  either  bore  must  be  packed 
before  a .  Then  certainly  the  sum  of  the  weight  of  a  and  the  weight  of  that  item  is  at  most 
1 .  If  one  is  still  available,  and  it  is  not  packed  in  a  Y{  bin  after  a ,  then  it  is  no  larger  than 
the  item  packed  with  a  by  B2F.  Consequently,  its  weight  is  no  greater,  and  the  sum  of 
its  weight  and  that  of  a  is  at  most  1.  If  the  available  item  is  packed  in  a  Yx  bin  after  Bh, 
then  its  weight  cannot  be  §  since  its  size  is  at  most  5.  But  if  its  weight  is  5,  the  weight  of 
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B*  is  at  most  f .  From  this  we  conclude  that  both  b  and  c  must  have  weight  exceed¬ 
ing  i 

Let  d  be  the  fallback  item  in  Bh.  Thus  s(b)  +  s(c)  >  s(d),  because  s(d)  ^  |  and 
s(  b)  ^  5(c)  >  It  could  not  be  the  case  that  both  b  and  c  were  available  when  d 
was  packed,  or  else  they  would  have  replaced  d.  Suppose  s(d)  >  s(b).  Since  s(d)  + 
5  (any  Y{  item)  ^  1,  and  since  d  is  larger  than  either  b  or  c,  whichever  of  these  is  packed 
before  d  must  be  one  of  two  fallback  items  in  its  bin  and  hence  will  have  weight  less  than 
5.  Suppose  s(b)  ^  s(d)  >  s(c).  Then  c  would  have  fit  with  a  and  d  in  Bh  had  it  been 
available.  Since  it  was  not  used,  we  conclude  that  c  must  be  packed  before  d  in  a  bin 
with  two  fallback  items,  and  hence  has  weight  less  than  5.  The  only  remaining  possibility 
is  that  s(c)  ^  s{d).  Now,  however,  any  item  no  larger  than  d  would  fit  in  Bh  with  a  and 
d.  Since  none  was  placed  there,  none  can  have  been  left  to  be  packed  after  the  Yx  bins, 
and  therefore  wB(d)  is  zero.  In  this  event,  since  b  and  c  are  packed  before  d,  wB(b)  and 
wB(c)  are  zero  as  well,  contradicting  the  assumption  that  wb(B*)>  f.  □ 

Lemma  4.4.  The  B2F  weight  of  an  optimal  bin  cannot  exceed  §  unless  the  bin  contains 
either  a  Y2  item  c/B2F  weight  greater  than  \  or  an  item  of  size  less  than  \  4-  A. 

Proof  To  obtain  the  proof,  suppose  otherwise  for  some  B*.  We  know  from  Lemma 
4.3  that  B *  cannot  contain  a  Yx  item.  It  is  easy  to  see  then  that  |  B*  |  ^3. 

Case  1.  Suppose  |  B*  |  =3.  Then  the  only  item  of  weight  exceeding  5  can  be  an  X2 
item.  Since  any  two  such  items  and  an  item  of  size  greater  than  g  +  A  would  be 
too  big  to  fit,  there  can  be  at  most  one  item  of  weight  exceeding  \  and  wb(B*)  ^  |  + 
2(1/3)  <  f. 

Case  2.  Suppose  |  B*  |  =4.  Suppose  first  that  the  largest  item  in  B*  is  an  X2  item. 
There  cannot  be  another  item  of  size  greater  than  J,  because  then  the  sum  of  these  sizes 
would  exceed  ji  —  A/2  +  \  +  2(|  +  A)  >  1.  Items  of  size  at  most  \  (X4,  Y4,  X5)  can  have 
weight  at  most If  any  of  these  items  were  to  have  weight  less  than  of  equal  to  5,  then 
wb{B*)  would  be  at  most  3  +  2(\)  +  5  =  f.  Thus  all  three  items  besides  the  X2  item 
must  be  X4  or  Y4  items  of  weight  But  such  items  have  size  exceeding  j  and  then 
s(B*)  >  33  -  A  +  3(5)  which  is  at  least  1  if  A  ^  If  A  >  35,  however,  s(B*)  > 
J2,  -  A/2  +  3(g  +  A)  >  1.  Thus  in  all  cases  where  |  B*\  =4  and  B*  contains  an 
X2  item,  wb(B*) 

Suppose  now  that  the  largest  item  is  a  Y2  item,  which  has  weight  less  than  or  equal 
to  5  by  assumption.  If  there  were  two  additional  items  of  size  greater  than  5,  we  would 
have  s(i?*)>5  +  2(5)  +  |  +  A>  1.  Thus  there  must  be  two  items  of  size  less  than  or 
equal  to and  hence  of  weight  at  most Since  there  can  be  no  item  of  weight  exceeding 
5,  we  must  have  wb(B*)  ^  2(\  +  \)  <  f . 

Since  |  B*  \  =4,  there  must  be  at  least  one  item  of  weight  exceeding  jo,  which  must 
be  of  type  X3  or  Y3  and  of  weight  |.  If  any  item  has  weight  less  than  or  equal  to  5,  wb(B  * ) 
would  be  at  most  3  (5)  +  5  =  f .  If  there  are  two  items  of  weight  we  would  still  have 
wb(B*)  <  f.  There  cannot  be  four  items  of  size  greater  than  so  there  must  be  an  X4 
or  Y4  item  of  weight  \  and  three  X3  or  Y3  items.  At  least  two  of  the  X3  or  Y3  items  must 
have  weight  and  so  there  must  be  a  bin  in  the  B2F  packing  containing  three  X3  or  Y3 
items  of  weight  5.  In  particular,  the  last  three-item  bin  is  exceptional  and  must  contain 
three  items  no  larger  than  those  in  B*.  (Even  if  the  items  in  B*  are  fallback  items,  there 
must  be  such  a  three-item  bin,  and  the  last  bin  is  exceptional.)  If  the  item,  x,  of  weight 
\  were  still  available  when  this  three-item  bin  was  packed,  then  x  and  any  other  item  of 
weight  \  would  replace  the  bin’s  last  item.  Thus  x  must  be  packed  as  a  fallback  item  in 
an  earlier  bin.  The  only  way  to  have  weight  \  would  be  in  a  bin  with  an  X2  (or  Y2 )  item 
and  another  item  of  weight  In  this  event,  however,  x  and  any  of  the  items  in  the  last 
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three-item  bin  would  fit  with  the  X2  item  since  they  fit  with  two  other  X3  or  Y3  items. 
(Note  that  x  cannot  be  packed  as  a  single  fallback  item  in  a  Yx  bin,  since  any  X3  or  73 
item  would  fit  and  be  used  instead  of  x.) 

Case  3.  Suppose  \B*\  =  5.  If  any  item  has  size  exceeding  3,  s(B*)  would  be  greater 
than  5  +  4(z  +  A)  >  1.  Similarly,  not  all  items  can  have  size  exceeding  3.  If  all  items  are 
no  larger  than  \  in  size,  then  the  weight  of  B*  would  be  at  most  f  since  none  of  the  items 
could  have  weight  more  than  \  and  at  least  one  would  have  weight  at  most  3.  Thus  there 
must  be  at  least  one  X3  or  Y3  item,  and  at  least  one  X5  item. 

Suppose  there  is  an  X3  item.  If  there  are  two  additional  items  of  size  greater  than 
s{B*)  >  js  ~  A/3  +  2(^)  +  2(i  T  A)  >  1.  Thus  there  is  at  most  one  item  of  weight 
exceeding  {  and  one  additional  item  of  size  exceeding  3.  Hence,  wb(B*)  ^  5  +  5  + 
3(i)  <f- 

Suppose  finally  that  the  largest  item  in  B*  is  of  type  Y3.  B*  can  contain  at  most 
one  such  item,  or  else  s(B*)  >  2{\)  +  3(g  +  A)>  1.  Also,  B*  cannot  contain  a  Y3  item 
and  two  X4  items,  or  else  s(B*)  >  i  +  2(|*  -  A/4)  +  2{\  +  A)  >  1.  However,  if  B* 
contains  three  items  each  of  weight  less  than  or  equal  to  3,  wb(B*)  S  5  +  \  + 
3(s)  <  5-  There  must  be  at  least  two  items  of  size  (and  hence  weight)  no  greater  than  3, 
or  else  j(2?*)>J  +  3(4)  +  £  +  A>1.  The  only  remaining  possibility  is  for  B *  to  contain 
a  Y3  item,  a  Y4  item,  an  X4  or  Y4  item,  and  two  X5  items.  By  Lemma  4.2,  either  the  Y3 
item  has  weight  less  than  5  or  the  Y4  item  has  weight  less  than  Either  of  these  possibilities 
contradicts  the  assumption  that  wb(B*)  >  f .  □ 

Lemma  4.5.  There  cannot  be  a  Yx  item  a  with  wB(a)  ^3  if  there  exist  items  b  and 
c  with  wB{b)  =  3,  5(c)  >  max  (^  -  A/4, 3  +  A),  and  s(a)  +  s(b)  +  5(c)  ^  1. 

Proof.  We  shall  show  that  under  these  conditions  no  bin  of  the  optimal  packing 
can  have  a  B2F  weight  exceeding  3.  Suppose  L  contains  such  items  and  that,  for  some 
optimal  bin  2?*,  wb(B*)  >  3.  As  we  have  seen  before,  there  is  no  loss  of  generality  in 
assuming  that  a  is  the  Yx  item  in  Bh.  We  know  from  Lemma  4.4  that  B *  must  contain 
a  Y2  item  of  weight  greater  than  3  or  an  item  of  size  less  than  3  +  A.  If  there  is  a  Y2  item 
of  weight  exceeding  3,  however,  then  there  must  have  been  such  an  item  available  when 
Bh  was  packed.  Since  any  such  item  is  smaller  than  the  sum  of  the  sizes  of  b  and  c,  it 
would  fit  with  a  in  Bh,  contradicting  the  definition  of  Bh.  Thus  the  only  possiblility  is 
for  B*  to  contain  an  item,  d ,  of  size  less  than  z  +  A.  When  a  was  packed,  if  d  and  any 
other  item  of  size  at  most  that  of  b  were  available,  they  would  be  used  in  place  of  the 
fallback  item  in  Bh.  Since  wB(b)  =  3,  either  b  itself  must  have  been  available  or  b  must 
be  packed  in  an  earlier  Yi  bin  and  some  item  no  larger  than  b  must  have  been  available. 
In  either  case,  there  must  have  been  an  item  no  larger  than  b  available  when  Bh  was 
packed.  Thus  d  must  not  have  been  available.  But  if  d  is  packed  in  a  Y{  bin  before  Bh, 
it  cannot  be  the  only  fallback  item,  since  there  must  be  an  item  of  weight  3,  no  larger 
than  b,  available  that  would  fit  with  any  Yx  item.  Thus  d  must  have  weight  no  greater 
than  \.  At  this  point,  we  must  consider  the  possible  configurations  for  B*.  Certainly  B* 
must  contain  at  least  three  items. 

Case  1.  Suppose  |  B*  |  =  3.  By  Lemma  4.3,  B*  cannot  contain  an  item  of  weight 
greater  than  {.  Thus  wb(B*)  +  i  + 

Case  2.  Suppose  |  B*  |  ^  4.  B*  must  contain  an  item  of  weight  greater  than  3, 
which  can  only  be  an  X2  item  by  Lemma  4.3  and  by  the  above  arguments  focusing  on 
the  weight  of  Y2  items.  Moreover,  if  there  is  not  a  second  item  of  weight  greater  than 
then  we  would  have  wb(B*)  ^i  +  3  +  l  +  5  =  ?.  If  there  is  a  second  item  larger  than  \ 
in  size,  then  there  must  be  a  second  item  whose  size  is  less  than  \  +  A,  or  else  s(B*)  > 
J2  -  A/2  +  |  +  g  +  g  +  A>  1.  Since  this  second  small  item  will  also  have  weight  no 
greater  than  g,  we  again  have  that  wb(B*)  ^3  +  3  +  2(g)  =  5. 
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Case  3.  Suppose  |  B*  |  =  5.  B*  must  contain  an  item  of  weight  greater  than  J. 
There  cannot  be  two  such  items,  or  else  s(B*)  >  1.  If  any  of  the  remaining  items  has 
weight  less  than  or  equal  to  5,  then  wb(B*)  S  5  +  2(|)  +  j  +  £  =  f.  But  this 
means  that  there  must  be  one  item  whose  size  exceeds  \  and  three  additional  items 
each  of  whose  sizes  exceeds  5.  Thus  s(B*)  >  \  +  3(^)  +  {  >  1.  Hence,  in  all  cases, 

To  complete  the  proof  of  Lemma  4.5,  we  observe  that  wB(L)  ^  B2F  ( L )  -  8  since 
each  nonexceptional  bin  has  a  weight  of  1,  while  wB(L)  ^  (f)  OPT  (L)  since  each  optimal 
bin  has  a  weight  bounded  above  by  §.  Combining  these  yields  B2F  (L)  S  (f )  OPT  (L)  + 
8,  contradicting  the  assumption  that  L  was  a  counterexample  for  CFB.  □ 

5.  Proof  of  the  main  result.  We  shall  now  employ  our  weighting  function  averaging 
technique  to  obtain  the  final  result.  From  Corollary  3. 1  and  Lemma  4.4  we  know  the 
optimal  bin  configurations  that  may  have  “too  much”  weight  from  the  respective  FFD 
or  B2F  weighting  function,  and  that  since  FFD  fails  to  achieve  the  required  bound,  any 
Y2  item  receives  an  FFD  weight  of  Also,  from  Lemma  4.5,  we  know  that  since  B2F 
fails,  any  Yx  item  either  cannot  be  packed  extremely  well  or  receives  a  B2F  weight  of  f . 
The  heart  of  the  proof  of  the  main  result  is  now  contained  in  the  following  lemma. 

Lemma  5.1.  If  B*  is  any  bin  of  the  optimal  packing  of  L,  wA(B*)  =  ( wF(B *)  + 
Wb(  B*))/2£§. 

Proof  To  obtain  the  proof,  suppose  otherwise  for  some  optimal  bin  B*.  Clearly, 
at  least  one  of  the  two  weighting  functions  must  give  B*  a  weight  exceeding  f . 

Case  1.  Suppose  wF(B*)  >  § .  Then  we  know  that  B*  must  contain  a  Y{  item,  a , 
and  that  |  B*  |  ^3.  If  1 2?*  |  =2,  then  the  second  item,  b ,  would  fit  with  a  when  a  was 
packed.  If  it  is  unavailable,  then  the  Y\  packing  rule  for  FFD  implies  that  b  cannot  have 
weight  exceeding  1  —  wF(a).  If  b  is  available,  then  the  item  packed  with  a  is  at  least  as 
large  as  b.  If  b  has  weight  less  than  or  equal  to  §,  then  wF{a)  +  wF(b)  ^  f ,  since  a  cannot 
have  weight  exceeding  f  unless  nothing  fits  with  it.  Since  we  already  know  that  there  are 
no  Y2  bins  in  the  FFD  packing  by  Lemma  3.2,  b  must  be  an  X2  item.  In  this  case, 
however,  a  must  also  be  packed  by  FFD  with  an  X2  item.  Thus  wF(a)  =  §  and 
wf(B*)<  f. 

Therefore,  we  may  assume  that  B *  =  {a,  b,  c},  where  s(a)  >  s(b)  ^  .s(c).  It  is 
easy  to  see  that  5(c)  <  \  and  s(b)  <  5,  or  else  s(i?*)  would  exceed  1.  Hence  wF(c)  ^  \ 
and  wF(b)  ^  implying  that  wF(a)  must  be  greater  than  § .  Let  Bt  denote  the  FFD  bin 
containing  a.  Since  b  would  fit  in  Bt  with  a  (or  any  other  Y\  item),  h >F(b)  S  1  —  wF(a), 
and  wf(B*)  S  f .  Note  further  that  c  must  be  an  X4  item,  or  else  its  weight  would  be  j 
and  B*  would  have  weight  less  than  or  equal  to  f . 

This  is,  for  those  readers  already  acquainted  with  FFD,  exactly  the  kind  of  situation 
where  one  expects  FFD  to  perform  poorly.  We  now  show  that,  in  this  case,  the  averaging 
process  with  B2F  permits  our  compound  algorithm  to  succeed. 

Suppose  wB(a)  =  \.  Unless  wB(b)  =  \  and  wB(c)  =  J,  we  have  wA(B *)^ 
(f  +  23/20)/2  =  f.  Now  wB(b)  =  5  implies  the  existence  of  a  bin  containing  three  items 
of  type  X3  or  Y3,  each  of  weight  5.  There  must  also  be  a  bin  containing  three  such  items 
each  no  larger  than  b ,  although  their  weight  may  be  zero  if  they  are  exceptional.  If  c 
were  available  when  this  three-item  bin  was  packed,  it  and  any  smaller  item  would 
replace  the  last  Y3  or  X3  item.  But  if  c  is  the  smallest  item  left,  it  is  either  /iast  and  hence 
is  exceptional,  or  it  is  a  fallback  item  and  has  weight  at  most  f  Therefore,  c  must  not  be 
available.  If  c  were  packed  in  a  Yx  bin,  the  items  of  the  three-item  bin  would  have  been 
used  unless  c  is  packed  with  a  second  fallback  item.  However,  it  then  has  weight  at  most 
5.  The  only  remaining  possibility  would  be  for  c  to  be  packed  with  another  X4  or  Y4  item 
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and  an  X2  or  Y2  item.  In  this  event,  however,  c  and  any  item  from  the  three-item  bin 
would  have  fit  with  any  X2  or  Y2  item  and  been  used  instead.  Thus  we  know  that  it  is 
impossible  for  a  to  have  weight  f  in  the  B2F  weighting. 

Suppose  wB(a)  >  f .  It  must  be  that  a  is  in  a  Y\  bin  packed  no  later  than  Bh,  and 
as  argued  before  there  is  no  loss  of  generality  in  assuming  that  a  is  packed  in  Bh.  Let  d 
be  the  fallback  item  packed  with  a  in  Bh.  Thus  s(b)  +  5(c)  >  s(d),  and  both  b  and  c 
could  not  have  been  available  when  d  was  packed,  else  they  would  have  replaced  d.  If 
s(d)  >  s(b),  then  wB(d)  ^  max  { wB(b),  wB(c)}  and  whichever  of  b  or  c  is  not  available 
has  weight  less  than  or  equal  to  ( \)wB(d ).  This  causes  wb(B *)  to  be  at  most  1  + 
(2 )wB(b)  no  matter  where  b  was  packed  by  B2F.  Unless  b  has  weight  3,  this  quan¬ 
tity  is  at  most  23/20  and  wA(B*)  would  be  at  most  f .  But  this  is  precisely  the  situation 
ruled  out  by  Lemma  4.5.  If  s(b)  ^  s(d)  >  5(c),  then  we  reach  the  same  conclusion, 
since  it  must  still  be  that  wB(d)  ^  wB(b)  and  since  c  is  not  available  implying  wB(c)  £ 
(l)wB(d).  Finally,  if  5(c)  ^  s(d),  then  any  item  no  larger  than  d  would  fit  in  Bh 
along  with  a  and  d.  Since  none  was  used,  wB(b ),  wB(c)  and  wB{d)  are  all  zero. 

Case  2.  Suppose  wb{B*)>  f  and  B*  contains  a  Y2  item  of  B2F  weight  ex¬ 
ceeding  5. 

Certainly  \B*\  ^4,  since  no  bin  can  contain  an  item  of  size  greater  than  5  and 
four  additional  items. 

Suppose  \  B*\  =4.  Then  there  can  be  no  other  X2  or  Y2  items  and  at  most  one 
other  item  of  size  exceeding  4,  or  else  s(B*)  >  1.  If  all  other  items  are  at  most  4  in 
size,  then  wb(B*)  ^  4  +  3(4)  =  i  Since  wF(B*)  ^  4  +  3(4),  we  would  have  wA(B *)  < 
f .  Therefore  there  must  be  an  X3  or  Y3  item. 

Suppose  B*  contains  an  X3  item.  Then  there  must  also  be  either  an  item  of  size  at 
most  \  or  an  item  of  size  less  than  4  +  A.  To  see  this,  observe  that  if  all  items  have  size 
exceeding  4,  s(B *)  >  4  +  tl  -  A/3  +  5  ^  1  if  A  ^  If  all  items  have  size  greater  than 
or  equal  to  4  +  A,  s{B*)  >  4  +  ts  “  A/3  +  2(4  +  A)  ^  1  if  A  >  However,  if  there  is 
an  item  of  size  less  than  4  +  A,  then  wF(B*)  ^  2(4)  +  4  +  0  =  11/12  while  wb(B*)  £ 
4  +  4  +  2(4)  =  i-  If,  on  the  other  hand,  B*  contains  an  X5  item,  then  wb(B*)  ^  4  + 
4  +  4  +  4  =  77/60  while  wF(£*)  ^  2(4)  +  4  +  5  =  67/60.  In  either  case,  wA(B*)  ^  f. 

Suppose  B*  contains  a  Y3  item,  x.  Since  vt^x)  ^  js,  wF(B*)  ^  4  +  13  +  2(4)  = 
11/10.  Thus  wb{B*)  must  be  more  than  13/ 10,  or  else  wA(B*)  cannot  exceed  f .  This 
implies  that  wB(x)  must  be  4-  In  this  event,  there  must  be  a  three-item  bin  with  three  Y3 
items  each  of  weight  4-  If  there  is  a  Y2  item  of  weight  4,  it  must  come  from  an  earlier  bin 
containing  exactly  two  Y2  items.  The  second  of  these  items  would  have  been  replaced, 
however,  by  any  two  of  the  Y3  items,  since  any  Y2  item  will  fit  with  any  two  Y3  items. 
Thus  there  can  be  no  Y2  items  of  weight  \  in  the  B2F  packing,  and  the  weight  of  the  Y2 
item  must  be  § .  Therefore,  wb(B*)  ^  §  +  4  +  2(4)  <  13/ 10. 

Suppose  now  that  \B*\  =3.  If  there  is  no  X2  item  or  if  there  is  an  item  of  size  less 
than  4  +  A,  then  wF(B*)  ^  1  and  wa(B*)  <  f.  Thus  we  may  assume  that  B*  contains 
an  X2  item  and  that  its  remaining  item  is  at  least  4  +  A  in  size.  Even  if  the  small  item, 
y,  is  of  type  X3l  then  wF(B*)  is  at  most  4  +  2(4)  =  b  If  the  Y2  item,  x,  has  B2F  weight 
less  than  4,  wb(B*)  ^  4  +  5  +  3  and  wA(B*)  ^  f.  The  only  way  that  x  can  have  weight 
4  is  to  be  in  a  two-item  bin,  Bj,  with  another  Y2  item.  This  means  that  y  must  not  have 
been  available  when  Bj  was  packed,  since  it  would  have  fit  with  x  and  any  Y2  item  (it 
fits  in  B*  with  x  and  an  X2  item).  Thus  y  cannot  have  weight  4  unless  there  is  a  three- 
item  bin  consisting  of  items  no  larger  than  y .  These  items,  however,  must  have  been 
available  when  x  was  packed,  and  thus  y  still  cannot  have  weight  4-  Therefore,  the  max¬ 
imum  B2F  weight  for  y  is 

If  y  is  a  Y3  item,  then  wF(y)  =  and  wF(B*)  ^4  +  3  +  I^  =  11/10.  Thus  wb{B*) 
is  at  most  4  +  5  +  ^  =  13/ 10  and  wA(B*)  =  f. 
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Therefore,  y  must  be  an  X3  item.  It  must  also  be  that  A  exceeds  or  else  s(B*)  > 
Y2  ~  A/2  +  3  +  ^  ”  A/3  ^  1.  Let  Bt  be  the  bin  containing  the  Y2  item,  x,  in  the  FFD 
packing.  We  know  from  Lemma  3.2  that  Bt  is  a  Yx  bin.  Let  z  be  the  X2  item  in  B *.  If  z 
were  packed  in  a  Yx  bin  in  the  FFD  packing,  its  weight  would  be  \  and  wA{B*)  would 
be  .  Thus  z  must  have  been  available  when  Bt  was  packed.  Since  it  was  not  used  in 
place  of  x,  the  size  of  the  Yx  item  in  Bt  exceeds  1  -  s(z).  But  z  fits  with  x  and  y,  so 
5(z)<  1  -  £-(£  -  A/3)  =  +  A/3.  Then  1  -5(z)  =  1 1/ 18  -  A/ 3,  which  is  greater 

than  |  -  2A  if  A  >  ^.  The  weighting  function  for  FFD  gives  weight  at  most  b  to  a  Y2 
item  packed  with  such  a  large  Y{  item,  and  again  we  have  wA(B*)  ^  f . 

Case  3.  Suppose  wb(B*)  >  f  and  B*  contains  an  item  a,  where  s(a)  <  z  +  A. 

We  know  that  B*  contains  neither  a  Yx  item  nor  a  Y2  item  of  weight  exceeding  ^ 
by  Lemma  4.3  and  Case  2  above,  respectively.  We  also  know  that  wB(a)  is  at  most  \ 
since  s(a)  <  By  Lemma  3.3,  we  know  further  that  Hy(£*)  ^  1,  so  that  if  wA(B*)  is 
to  exceed  f ,  we  must  have  wb(B*)  >  5.  Thus  |B*|  >3,  since  any  two  items  with  a  can 
each  have  weight  at  most  \ . 

Suppose  |  I  =  4.  Then  there  must  be  an  X2  item,  or  else  wb(B*)  £  3(|)  +  i 
There  can  be  at  most  one  additional  item  exceeding  \  in  size,  or  else  s(B*)  >  n  - 
A/2  +  2(|)  +  £  >  1.  But  then  wb(B*)  ^  5  +  5  +  2(|)  < 

Suppose  |  5*  |  =5.  There  cannot  be  an  X2  item,  or  else  s(B*)  >  jl  -  A/2  + 
4(g)  >  1.  Nor  can  there  be  two  items  of  size  greater  than  or  else  s(B*)  > 
2(!)  +  3(g)  =  1.  Finally,  if  only  one  item  has  size  exceeding  3,  wb(B*)  ^  5  + 

4(J)<|.  □ 

Theorem  5.1.  Min  {FFD  (L),  B2F  (L)}  ^  (f)  OPT  (L)  +  8. 

Proof [  To  obtain  this  inequality,  we  observe  that  our  presumed  counter¬ 
example  obeys  min  (FFD  (L),  B2F  (L)}  -  8  ^  (FFD  (L)  -  8  +  B2F (L)  ~  8)/2  ^ 

( wf(L )  +  wb(L))/2  =  wa(L)  by  our  definitions  for  wF,  wB ,  and  wAi  while  wA(L)  ^ 
(f)  OPT  (L)  by  Lemma  5.1.  □ 

6.  Remarks.  We  have  limited  our  analysis  to  proving  that,  for  any  list,  either  the 
FFD  or  the  B2F  algorithm  will  asymptotically  use  within  f  the  optimal  number  of  bins. 
However,  we  have  been  unable  to  find  examples  that  are  even  close  to  this  bound.  In 
fact,  the  only  examples  we  have  been  able  to  contrive  that  exceed  |  the  optimum  depend 
heavily  on  the  modification  that  we  introduced  to  B2F  to  simplify  our  proof.  For  these 
instances,  this  modification  forces  the  B2F  packing  to  be  the  same  as  the  FFD  packing. 
If  “small”  items  are  not  held  back,  the  exact  bound  might  be  significantly  better  (although 
a  proof  of  this  may  well  be  extremely  difficult). 

Our  weighting  function  averaging  technique  actually  proves  that,  even  if  both  al¬ 
gorithms  produce  particularly  egregious  packings  for  some  list,  the  average  of  the  number 
of  bins  used  by  FFD  and  the  number  used  by  B2F  is  asymptotically  at  most  f  the  optimal 
number  of  bins  for  that  list.  Presumably,  the  minimum  may  always  be  considerably  less 
than  this  upper  bound  on  the  average.  Furthermore,  we  remark  that  the  additive  constant 
we  have  used  (eight)  is  much  higher  than  necessary.  Instead  of  assigning  a  weight  of  zero 
to  every  exceptional  item,  we  could  assign  a  weight  that  agrees  with  an  item’s  type,  and 
easily  reduce  this  constant.  Nevertheless,  because  we  believe  that  the  f  coefficient  is  itself 
inflated,  the  additive  constant  appears  to  be  of  little  significance. 


Appendix.  Bin  packing  results  for  B2F  alone.  We  seek  to  determine  the  worst-case 
behavior  of  the  B2F  algorithm.  Before  doing  so,  however,  we  briefly  discuss  some  other 
aspects  of  this  approach  to  bin  packing. 

We  could  extend  the  idea  of  “best  2  fit”  to  “best  j  fit,”  for  arbitrary  j  >  2.  It  seems 
likely  that  the  expected  performance  of  these  more  complex  algorithms  might  be  better, 
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although  the  worst-case  performance  can  be  shown  to  be  worse,  approaching  a  number 
greater  than  1.3  as  j  grows  without  bound.  Simple  tests  using  a  uniform  distribution  for 
item  sizes  seem  to  back  up  the  improved  expected  case,  although  the  run  time  increases 
rapidly. 

B2F  can  also  be  used  in  the  multifit  approach  to  multiprocessor  scheduling.  Again, 
its  worst-case  performance  is  poorer  than  that  of  FFD.  In  [3],  it  is  shown  that  B2F’s 
asymptotic  worst-case  bound  is  precisely  f,  while  it  has  been  proved  in  [4]  that  FFD  can 
be  implemented  to  ensure  a  tight  bound  of  72/61 . 

Returning  to  bin  packing,  Fig.  3  depicts  an  example  illustrating  that  B2F  may  require, 
asymptotically,  as  many  as  f  the  optimal  number  of  bins. 

To  prove  that  the  §  ratio  cannot  be  exceeded  by  B2F,  we  modify  the  algorithm 
slightly  in  that  items  less  than  or  equal  to  5  the  bin  size  will  be  held  back  and  packed  by 
the  FFD  algorithm.  This  certainly  does  not  affect  the  example  illustrated  in  Fig.  3,  but 
it  allows  us  to  assume  that  no  items  of  size  5  or  less  are  used  in  packing  L,  which  we 
now  presume  to  be  minimal  counterexample.  This  reduces  the  number  of  cases  we  must 
investigate,  thereby  simplifying  our  proof  (although  it  probably  detracts  from  the  expected 
performance  of  the  algorithm ) . 

Lemma  A.  Every  item  in  L  has  less  than  f . 

Proof.  Let  b  be  the  largest  item  in  L  and  suppose  s(b)  ^  § .  Then  b  is  packed  in  Bx 
by  the  B2F  rule.  Removing  the  items  of  Bx  cannot  change  the  remainder  of  the  packing. 
Since  s(b)  ^  § ,  |  1  ^2  and,  if  |  Bx  |  =2,  then  Bx  contains  the  largest  item  that  would 

fit  with  b  in  a  bin  of  size  1.  If  the  item  or  items  of  B\  are  removed  from  L,  then  both 
B2F  ( L )  and  OPT  (L)  can  easly  be  reduced  by  one,  contradicting  the  presumed  minimality 
of  L  with  respect  to  B2F.  □ 

Theorem  A.  B2F  (L)  ^  (|)  OPT  (L)  +  4. 

Proof.  We  classify  an  item,  x,  by  its  size  so  that  if  1  /  (/  T-  1 )  <-s(x)  ^  1  /i,  then  x 
is  of  type  X(.  The  reasoning  above  shows  that  all  items  are  of  types  X1,X2,X3)  or  X4, 
and  items  of  type  Xl  are  less  than  f  in  size.  We  now  define  a  weighting  function  w  on 
the  items  of  L  based  on  the  B2F  packing. 
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,  (b)  OYT(L)  =  1+  4  +  •  •  •  +  4»~»  +  4*~‘  =  (4*~3~  1?  +  4»“L  =  (^  4*-1  -  | 

Fig.  3.  Worst-case  example  for  B2F.  (B2F  (L)/OPT  (L))  =  (5(4/c“1)  -  2)/(4(4fc~1)  -  1)  5/4  as 
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If  B  is  any  bin  with  four  items  in  it,  each  item  is  assigned  a  weight  of  5.  Suppose  B 
is  a  bin  containing  an  Xx  item,  b .  Then  if  |  B\  =  3,  w(b)  =  fs,  and  the  other  two  items 
are  each  assigned  a  weight  of  If  |  B  |  =2,  then  w(b)  =  fs  and  the  other  item  is  assigned 
a  weight  of  is  if  the  other  item  is  of  type  X2.  Otherwise,  w(b)  =  f  and  the  remaining  item 
is  assigned  a  weight  of  5. 

Suppose  the  largest  item  in  B  is  of  type  X2.  Then  if  |  B\  =2,  each  item  must  be  of 
type  X2 ,  and  is  assigned  a  weight  of  §,  except  possibly  for  the  last  bin  containing  an  X2 
item.  If  the  last  bin  containing  an  X2  item  has  only  2  items  in  it,  it  will  be  classified  as 
exceptional  (as  will  its  items).  All  exceptional  items  are  given  weight  zero.  (This  is  an 
unnecessarily  strict  weight  reduction,  accounting  for  the  constant  4  in  the  theorem.  A 
more  careful  analysis  using  larger  weights  for  the  exceptional  items  could  likely  reduce 
this  constant  to  1.)  If  |  B  |  =  3  and  B  contains  two  X2  items,  each  is  given  a  weight  of 
jo  and  the  remaining  item  is  given  a  weight  of  5.  If  B  contains  only  one  X2  item,  then  it 
is  given  a  weight  of  §  and  the  other  two  are  each  given  a  weight  of  5.  If  the  largest  item 
is  of  type  X3,  then  |  B\  =  3  implies  all  three  items  are  of  type  X3,  except  possibly  for  the 
last  such  bin  (which  is  also  classified  as  exceptional).  All  three  X3  items  in  such  a  bin 
are  given  a  weight  of  73.  One  additional  exceptional  bin  shall  be  identified.  If  the  last  X2 
item  of  size  exceeding  js  is  packed  with  an  X2  item  of  size  less  than  then  this  bin  is 
classified  as  exceptional,  and  its  items  assigned  weights  of  zero. 

The  definition  of  w  is  summarized  in  Table  6. 

We  now  show  that  each  bin  B*  of  the  optimal  packing  must  satisfy  w(B*)  ^  1. 
This,  together  with  the  observation  that  w(B)  =  \  for  each  nonexceptional  bin  in  the 
B2F  packing,  will  complete  the  proof  of  Theorem  A. 

Suppose  B*  is  a  bin  of  the  optimal  packing  with  w(2?*)  >  1.  Clearly,  |  B*  \  >  1.  (If 
B*  contains  an  exceptional  item,  then  after  removing  the  item  w(B*)  would  still  exceed 
1.  Thus  it  is  enough  to  show  that  w(Z?*)  ^  1  for  bins  not  containing  exceptional  items.) 

Case  1.  Suppose  |  i?*|  = 2 . 

If  neither  item  has  weight  greater  than  § ,  then  w(B* )  ^  j  <  1 .  Thus  B*  must  contain 
an  item  of  type  Xx .  The  weight  of  this  item  is  less  than  or  equal  to  §  and  the  weight  of 
an  X2  item  is  less  than  or  equal  to  §.  Since  B *  cannot  contain  two  X1  items,  w(£*)  ^ 

3.  a  =  1 

5  +  5 

Case  2.  Suppose  |i?*|  =3. 

The  largest  item  in  must  have  a  weight  exceeding  5,  and  so  must  be  of  type  Xx 
ox  X2. 


Table  6 

Weighting  function  w  used  in  analysis  o/B2F  alone. 
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Suppose  the  largest  item  is  of  type  Xx,  so  that  B*  =  {b,  c,  dj,  where  s(b)> 
s(c)  ^  s(d).  Then  neither  c  nor  d  can  be  of  type  X2.  If  both  are  of  type  X4 ,  then 
w(B*)  ^  1.  Both  cannot  be  of  type  X3,  since  s(b )  +  .s(c)  +  s(d)  cannot  exceed  I.  Thus 
c  is  of  type  X3,  w(c)  d  is  of  type  X4,  and  w{d)  If  both  c  and  d  were  available 
when  b  was  packed,  then  either  b  was  packed  with  an  X2  item  or  with  two  other  items. 
In  either  case,  w(b)  =  and  w(i?*)  ^  -ft  +  -^  +  f  =  1.  Therefore,  w(b)  must  be  |.  If  c 
is  packed  before  b ,  then  w(c)  ^  f  and  w(B*)  ^  1.  If  </is  packed  before  b ,  then  it  must 
be  packed  with  an  Xx  item  and  another  item,  since  c  would  have  fit  and  was  not  used. 
Hence  w(b)  -  and  w(i?*)  +  b  +  1- 

Thus  the  largest  item  in  B*  must  be  of  type  X2.  If  there  is  only  one  X2  item, 
w(B*)  ^  +  2(13)  <  1.  Thus  B*  =  {b,  c ,  d}  with  b  and  c  both  of  type  X2,  where 

s(b)  ^  s(c).  If  w(d)  ^  f,  then  w(B*)  ^  2(f)  +  f  =  1.  Thus  d  is  an  X3  item  and 
w(d)  =  n-  Also,  w(b)  =  w(c)  =  f,  since  otherwise  w(i?*)  ^f  +  ll  +  T5<l*  Since 
s(d)  >  f ,  it  must  be  that  5(c)  <  f .  If  d  were  packed  before  c,  then  w(d)  would  only  be 
f ,  so  that  d  must  be  available.  In  order  for  w{c)  to  be  f ,  c  must  be  packed  by  B2F  in  a 
bin  B  =  {c,  x}  or  {c,  y,  z}.  If  |  B\  =2,  then  since  d  would  not  fit  in  B,  s(x)  >  s(b) 
and  b  must  be  in  a  bin  with  an  X2  item  and  one  other  item,  contradicting  w(b)  =  f.  If 
|  B  |  =3,  then  neither  y  nor  z  can  be  of  type  X2 .  Since  there  must  be  an  X2  item,  u ,  left 
(or  else  B  would  be  exceptional)  and  since  u  is  smaller  than  c,  the  B2F  rule  would 
have  placed  c,u,  and  an  X3  item  in  B  since  c,  u,  and  d  would  have  fit.  Thus  it  is  impos¬ 
sible  to  have  w(b)  -  w(c)  =  f  while  w(d)  =  b  and  we  conclude  that,  in  any  event, 
w(B *)^  1. 

Case  3.  Suppose  \B*\=4. 

B*  cannot  contain  an  Xx  item,  since  f  +  3(f)  >  1.  Neither  can  it  contain  two  X2 
items,  since  2(f)  +  2(f)  >  1.  Similarly,  it  cannot  contain  four  X3  items,  since  each  has 
size  greater  than  f .  However,  if  it  contains  three  items  of  type  X3  and  one  of  type  X4, 
then  w(B*)  ^  3  (33)  +  f  =  1.  Thus  B*  must  contain  exactly  one  X2  item.  If  the  other 
three  items  have  weight  less  than  or  equal  to  f,  w(B*)  ^  f  +  3(f)  =  1.  If  there  were 
two  X3  items,  ■s,(i?*)>f-F2(f)  +  f>  1.  Thus  B*  must  contain  exactly  one  X3  item. 
Let  B*  ~  { b,c,d,e },  with  b  of  type  X2i  and  c  of  type  X3 .  If  w(b)  <  f,  then 
w(B*)  =  h)  +  13  +  2(f)  <  1.  Thus  w(b)  =  f  and  w(c)  =  13.  This  means  c  must  be 
available  when  b  is  packed. 

If  b  is  the  largest  item  in  some  bin  B  of  the  B2F  packing,  then  B  would  contain  two 
X2  items  and  another  item  since  s(&)  +  s(c)  +  s(d)  +  s(e)  ^  1  implies  that  2 s(b)  + 
s(c)  <  1.  This  cannot  happen,  however,  so  it  must  be  that  B  =  {x,  b}  where  s(x)  > 

1  -  2 s(c),  since  b  was  not  replaced  by  two  smaller  items.  Because  .s(c)  <  1  —  f  —  f  = 
^3,  we  know  s(x)  >  Thus  B  is  the  third  exceptional  bin  ( s(b )  <  1  —  4  —  §  =  sj)  and 
again  w(B*)  ^  1. 

Now,  to  complete  our  proof  of  Theorem  A,  we  note  that  w(B)  =  3  for  all  but  at 
most  four  B2F  bins  (the  three  exceptional  bins  and  the  last  bin),  so  that  2X€jL  vK*)  ^ 
(f)(B2F  (L)  -  4).  At  the  same  time,  w(B*)  ^  1  for  all  B*  in  the  optimal  packing  en¬ 
sures  2X€z.w(.x)^OPT(L).  Combining  these  two  inequalities  yields  B2F  (L)  ^ 
(|)  OPT  (L)  +  4,  as  desired.  □ 
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SEMIKERNELS,  QUASI  KERNELS,  AND  GRUNDY 
FUNCTIONS  IN  THE  LINE  DIGRAPH* 

H.  GALEANA-SANCHEZf,  L.  PASTRANA  RAMIREZ*,  AND  H.  A.  RINCON-MEJIA* 

Abstract.  It  is  proved  that  the  number  of  semikemels  (quasi  kernels)  of  a  digraph  D  is  less  than  or  equal 
to  the  number  of  semikemels  (quasi  kernels)  of  its  line  digraph  L(D).  It  is  also  proved  that  the  number  of 
Grundy  functions  of/)  is  equal  to  the  number  of  Grundy  functions  of  its  line  digraph  L(D)  (in  the  case  where 
every  vertex  of  D  has  indegree  at  least  one). 

Key  words.  Grundy  function,  kernel,  line  digraph,  quasi  kernel,  semikemel 

AMS(MOS)  subject  classification.  05C20 

1.  Introduction.  For  general  concepts  we  refer  the  reader  to  [1].  Let  D  =  (X,  U) 
be  a  digraph  (also  we  denote  X  =  V(D)  and  U  =  A(D)).  A  set  K  c  X  is  said  to  be  a 
kernel  if  it  is  both  independent  (a  vertex  in  K  has  no  successor  in  K)  and  absorbing  (a 
vertex  not  in  K  has  a  successor  in  K). 

This  concept  was  introduced  by  Von  Neumann  [10]  and  it  has  found  many  appli¬ 
cations  [1,  p.  304],  [2] .  Several  authors  have  been  investigating  sufficient  conditions  for 
the  existence  of  kernels  in  digraphs,  namely.  Von  Neumann  and  Morgenstem  [9] ,  Rich¬ 
ardson  [11],  Duchet  and  Meyniel  [4],  [5],  and  Galeana-Sanchez  and  Neumann- 
Lara  [  7  ] . 

In  [8]  Harminc  proved  that  the  number  of  kernels  of  a  digraph  is  equal  to  the 
number  of  kernels  in  its  line  digraph.  In  this  paper  we  find  similar  relations  for  concepts 
nearly  related  to  the  concept  of  kernel,  and  we  survey  the  theorems  relating  these  concepts. 

Definition  1.1  [10].  A  semikemel  S  of  D  is  an  independent  set  of  vertices  such 
that  for  every  z e(V( D)  —  S)  for  which  there  exists  a  Sz- arc  there  also  exists 
an  zS- arc. 

Definition  1.2  [3].  A  quasi  kernel  Q  of  D  is  an  independent  set  of  vertices  such 
that  X  =  Q  U  T~(Q)  U  T~(T~(Q))  (where  for  any  A  ^  X,  T~(A)  =  {x  e  X\x  has  a 
successor  in  A  } ) . 

Definition  1.3  [1,  p.  312].  A  nonnegative  integer  function  ^(x)  is  called  a  Grundy 
function  of  D  if,  for  every  vertex  x,  g(x)  is  the  smallest  nonnegative  integer  which  does 
not  belong  to  the  set  {g(y)  \  y  €  T+(x) } . 

This  concept,  originated  by  Grundy  for  digraphs  without  directed  cycles,  was  ex¬ 
tended  by  Berge  and  Schiitzenberger. 

The  Grundy  function  can  also  be  defined  as  a  function  g(x)  such  that 

(1)  g(x)  =  k  >  0  implies  that  for  each  0  S  j  <  k  there  is  a  y  €  r+(x)  with 

g(y) 

(2)  #(x)  =  k  implies  that  each  y  €  T+(x)  satisfies  g(^)  =£  k. 

Theorem  1 . 1  [  3  ] .  Every  finite  digraph  has  a  quasi  kernel .  A  generalization  of  this 
theorem  was  obtained  by  Duchet ,  Hamidoune ,  and  Meyniel  [6]. 

Theorem  1.2  [10].  If  D  is  a  digraph  such  that  every  induced  subdigraph  has  a 
nonempty  semikemel  then  D  has  a  kernel. 

Theorem  1.3  [1,  p.  313].  If  D  is  a  digraph  such  that  every  induced  subdigraph 
has  a  kernel  then  D  possesses  a  Grundy  function . 
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Summary.  We  consider  the  problem  of  transforming  a  list  Lof  records  sorted 
on  some  key  into  two  sublists  Lt  and  L2  where,  for  each  distinct  key  in 
L,  contains  the  first  record  of  L  that  possesses  the  key  and  L2  contains 
all  records  of  L  with  duplicate  keys.  We  desire  that  our  duplicate-key  extrac¬ 
tion  algorithm  perform  the  transformation  in  place  and  be  stable  (that  is, 
records  within  each  sublist  must  obey  the  original  order  given  by  L).  This 
operation  is  useful  in  database  and  related  file  processing  environments 
whenever  only  distinct  keys  need  be  considered.  Moreover,  stability  in  extrac¬ 
tion  insures  that  L  can  be  efficiently  restored  at  a  later  time  with  a  stable 
merge  of  Lx  and  L2 .  Any  procedure  for  performing  duplicate-key  extraction 
on  a  list  of  size  n  must  require  at  least  0(n)  time  and  0(1)  extra  space, 
although  the  obvious  algorithm  for  achieving  either  bound  alone  violates 
the  other  bound.  We  design  a  stable  algorithm,  using  block-rearrangement 
techniques,  and  show  that  it  is  optimal  in  the  theoretical  sense  that  it  achieves 
both  lower  bounds  simultaneously.  We  also  prove  that  its  worst-case  number 
of  key  comparisons  and  record  exchanges  sum  to  no  more  than  6  n,  suggest¬ 
ing  that  the  algorithm  has  practical  application  as  well. 

1.  Introduction 

Suppose  that  we  are  given  a  list  L  with  n  records  sorted  on  some  key,  and 
that  some  records  may  possess  the  same  key.  A  variety  of  questions  about 
such  lists  focus  only  on  distinct  keys.  For  example,  one  might  want  to  know 
whether  a  list  contains  a  particular  key,  whether  two  or  more  lists  contain 
the  same  keys,  whether  all  keys  are  present  within  a  certain  range,  etc.  Repeated 
queries  of  this  nature  frequently  arise  in  database  and  file  processing  environ¬ 
ments,  and  can  often  be  most  efficiently  addressed  by  first  extracting  duplicate 
keys  from  L,  leaving  only  one  copy  of  each  distinct  key. 

*  A  preliminary  version  of  a  portion  of  this  paper  [HL 1]  was  presented  at  the  24th  Annual 
Allerton  Conference  on  Communication,  Control  and  Computing  held  in  Monticello,  Illinois,  in 
October,  1986 
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grants  ECS-8403859  and  MIP-8603879,  and  by  the  Office  of  Naval  Research  under  contract  N00014- 
88-K-0343 
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If  n  is  large,  then  using  an  additional  list  to  hold  a  copy  of  each  of  L’s 
distinct  keys  may  require  a  prohibitive  amount  of  additional  storage.  Therefore, 
we  focus  our  attention  on  schemes  that  extract  duplicates  in-place ,  so  that  L 
is  transformed  into  the  concatenation  of  two  sublists,  Lx  and  L2,  where  L1 
contains  one  record  for  each  distinct  key  in  L  and  L2  contains  L—Lx.  We 
allow  the  use  of  a  few  extra  memory  cells  to  aid  in  the  transformation,  but 
their  total  number  must  be  constant.  In-place  extraction,  however,  dictates  that 
the  operation  be  invertible  (since  L  is  altered),  so  that  we  can  restore  L  to 
its  original  sequence  if  necessary.  Hence  we  ask  that  contain  the  first  copy 
of  each  distinct  key  and  that  our  algorithm  be  stable,  by  which  we  mean  that 
keys  within  each  sublist  retain  the  relative  order  they  held  within  L.  Clearly 
a  subsequent  in-place,  stable  merge  of  L1  and  L2  restores  L  (see,  for  example, 
[Tr,  SS2,  HL4]  for  increasingly-efficient,  asymptotically-optimal  versions  of 
such  a  merge). 

Any  algorithm  to  solve  this  problem  needs  Q{n)  time  since,  in  general,  every 
key  must  be  examined  at  least  once  to  determine  whether  it  is  duplicated  within 
L.  There  is,  of  course,  an  obvious  way  to  construct  and  L2  in  0(n)  time, 
by  a  linear  scan  of  L  and  temporary  use  of  a  list  of  storage  cells  separate 
from  L.  This  simple  method  is,  unfortunately,  unacceptable  since  Q(n)  additional 
space  must  be  available.  Similarly,  there  is  a  straightforward  way  to  construct 

and  L2  in-place  in  0(1)  extra  space,  by  finding  each  new  element  of  Lx 
in  turn  and  moving  it  to  its  final  position  by  shifting  duplicates  out  of  the 
way.  In  this  case,  however,  Q(n2)  time  may  be  required  just  to  move  records. 
In  the  sequel,  we  will  show  that  a  more  complicated  strategy,  which  uses  0{]/n) 
blocks  of  size  0(]/n),  solves  the  problem  in  both  0(n)  time  and  0(1)  space, 
and  hence  is  optimal  to  within  a  constant  factor. 

In  the  following  section,  we  discuss  previous  and  ongoing  work  as  it  relates 
to  the  main  results  of  this  paper,  as  well  as  to  this  general  topic.  In  Sect.  3, 
we  introduce  some  necessary  notation  and  define  a  few  useful,  primitive  suboper¬ 
ations.  Section  4  contains  an  overview  of  the  main  algorithm  along  with  an 
example,  where  simplifying  assumptions  are  made  on  list,  sublist  and  block 
sizes  in  order  to  facilitate  discussion.  Time  and  space  measures  are  derived. 
In  Sect.  5,  we  describe  minor  implementation  details  that  permit  the  simplifying 
assumptions  to  be  dropped.  We  also  include  a  listing  of  our  algorithm  furnished 
to  us  by  the  editor,  Michael  J.  Fischer,  who  was  kind  enough  to  take  the  interest 
to  implement  our  technique  in  Pascal.  Section  6  contains  an  analysis  that  shows 
that  the  worst-case  number  of  key  comparisons  and  record  exchanges  sum  to 
no  more  than  6  n,  suggesting  the  algorithm  may  be  of  practical  as  well  as  theoreti¬ 
cal  interest.  In  the  final  section,  we  present  a  brief  discussion  of  some  conclusions 
that  can  be  drawn  from  this  effort. 


2.  Related  Work 

The  efficiency  we  shall  achieve  in  breaking  L  into  0(|/n)  blocks,  each  of  size 
0(j/n),  inherently  relies  on  the  notions  of  internal  buffering  and  block  rearranging , 
which  can  be  traced  back  to  the  seminal  work  described  in  [Kr].  As  applied 
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in  this  paper,  this  general  approach  allows  us  to  employ  one  block  as  the  buffer 
to  aid  in  grouping  the  remainder  of  L  into  blocks  of  different  types.  Since  only 
the  contents  of  the  buffer  and  the  relative  order  of  the  blocks  need  end  up 
out  of  sequence,  linear  time  is  sufficient  to  complete  our  task  by  sorting  both 
the  buffer  and  the  blocks  (each  sort  involves  0{]/n)  keys). 

Previously  published  results  have  only  shown  merging  (and  hence  sorting 
by  merging)  to  yield  to  this  line  of  attack.  An  unstable  method  was  first  devised 
in  [Kr].  A  stable  scheme  was  later  proposed  in  [Ho]  that,  unfortunately,  had 
the  rather  undesirable  side-effect  that  records  had  to  be  alterable  during  its 
execution.  Subsequently,  a  general  algorithm  for  optimal  time  and  space,  stable 
merging  and  sorting  appeared  in  [Tr].  For  the  most  part,  however,  these  results 
have  been  of  theoretical  interest  only,  due  primarily  to  their  intricacy  and  prohi¬ 
bitively  large  time  complexity  constants  of  proportionality. 

Continued  research  efforts  have  focused  on  simpler,  more  practical  internal 
buffering  and  block  rearranging  strategies  for  optimal  time  and  space  unstable 
merging  [HL2,  MU]  and  stable  merging  [HL4,  SS2],  as  well  as  even  faster 
stable  sorting  schemes  [HL4]  that  bypass  the  obvious  merge-sort  implementa¬ 
tion.  It  has  also  been  shown  that  unmerging  is  amenable  to  this  general  approach 
[SS 1],  although  information  other  than  a  record’s  key  alone  must  be  available. 
Furthermore,  we  have  very  recently  found  [HL3]  that  all  of  the  elementary 
binary  set  and  multiset  operations  can  be  performed  on  sorted  lists  in  linear 
time  and  constant  extra  space,  with  potential  application  to  a  number  of  file 
processing  problems. 

The  primary  aim  of  this  paper  is  to  demonstrate  that  stable  duplicate-key 
extraction  is  possible  in  optimal  time  and  space.  We  shall  also  show  that,  relative 
to  previously  published  results  along  this  line,  our  algorithm  is  straightforward 
and  practical.  Important  differences  between  the  techniques  used  in  the  well- 
known  merge  strategy  of  [Tr]  and  the  methods  we  employ  in  the  duplicate-key 
extraction  method  we  present  herein  include  these :  1)  we  pass  the  buffer  directly 
across  the  list  so  as  to  minimize  unnecessary  record  movement,  2)  we  avoid 
tedious  complications  in  the  special  case  in  which  there  are  not  enough  distinct 
keys  to  fill  the  buffer,  and  3)  we  delay  buffer  resequencing  until  the  final  step, 
thereby  sorting  it  only  once. 

3.  Notation  and  Preliminaries 

Let  L  denote  a  list  of  n  records,  indexed  from  1  to  n.  We  use  KEY(i)  as  a 
shorthand  to  denote  the  key  of  the  record  with  index  z,  and  assume  that  L 
is  sorted  in  nondecreasing  order  so  that  KEY  (1)^  KEY  (2)^ ...  SKEY(n).  For 
the  sake  of  complete  generality,  we  allow  neither  the  key  nor  any  other  part 
of  a  record  to  be  modified  during  duplicate-key  extraction.  Such  is  necessary, 
for  example,  when  there  is  no  explicit  key  field  within  each  record,  but  instead 
a  record’s  key  is  a  function  of  one  or  more  of  its  data  fields.  We  use  the  term 
Lx  record  to  denote  one  that  is  to  go  to  Lx.  That  is,  an  L±  record  either 
has  index  1  or  it  has  index  z,  for  some  l<z^n,  where  KEY(i—  \)<KEY{i), 
We  employ  the  term  L2  record  in  an  analogous  fashion  to  denote  one  that 
is  to  go  to  L2. 
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Only  the  two  common  0(1)  time  and  space  elementary  operations  are 
assumed,  namely,  record  exchanges  and  key  comparisons.  The  exchange  proce¬ 
dure,  SWAP(i,j\  directs  that  the  zth  andyth  records  are  to  be  exchanged.  The 
comparison  functions,  for  example  KEY (i) < KEY(j),  return  the  expected  Boo¬ 
lean  values  dependent  on  the  relative  values  of  the  keys  being  compared. 

From  these  primitive  operations  we  construct  a  few  0(1)  space  useful  subpro¬ 
grams  for  dealing  with  blocks .  Let  us  define  a  block  to  be  a  set  of  records 
from  L  with  consecutive  indices.  The  head  of  a  block  is  the  record  with  the 
lowest  index;  the  tail  is  the  one  with  the  highest  index.  If  a  block  contains 
only  Lt  (L2)  records,  then  we  refer  to  it  as  an  Lx  (L2)  block.  The  procedure 
BLOCKSWAP  (i,j,  h)  exchanges  a  block  of  h  records  beginning  at  index  i  with 
a  block  of  h  records  beginning  at  index  j  in  0(h)  time.  We  specify  that  blocks 
do  not  partially  overlap  (i.e.,  if  i=f=y  then  h^\i—j\)  and  that  when  BLOCKSWAP 
is  finished,  records  within  a  moved  block  retain  the  order  they  possessed  before 
BLOCKSWAP  was  invoked.  A  block  of  h  records  beginning  at  index  i  is  sorted 
in  nondecreasing  order  by  the  procedure  SORT(i ,  h).  Finally,  the  procedure 
BLOCKSORT(i,  h,  p)  uses  BLOCKSWAP  to  rearrange  the  p  consecutive  blocks, 
each  with  h  records,  beginning  at  index  i  so  that  their  heads  are  sorted  in 
nondecreasing  order.  To  reduce  unnecessary  record  movement,  which  is  impor¬ 
tant  should  records  be  relatively  long,  we  assume  SORT  and  BLOCKSORT 
use  a  straight  selection  sort  [Kn],  yielding  respective  time  complexities  0(h2) 
and  0(p2-\-ph). 


4.  An  Overview  of  the  Main  Algorithm 

In  order  to  facilitate  discussion,  let  us  assume  for  the  moment  that  n  is  a  perfect 
square,  with  a]/nL1  records  and  b]fnL2  records,  where  a  and  b  are  positive 
integers  and  a  +  b  =  J  fn.  Figure  1  a  depicts  such  a  list  with  a  —  3,  b  =  3  and  n  =  36. 
Only  record  keys  are  listed,  represented  by  capital  letters.  Subscripts  are  added 
to  keep  track  of  duplicate  keys  as  the  algorithm  progresses. 

The  first  step  of  the  algorithm  is  to  fill  an  “internal  buffer”  of  size  ]/n 
with  the  ]/n  largest-keyed  Lx  records  (their  order  within  the  buffer  is  not  impor¬ 
tant).  Thus  we  convert  L  into  the  form  ABC  where  the  records  of  A  have  not 
been  disturbed,  B  is  the  buffer,  and  C  is  a  suffix  (rightmost  sublist)  of  L2.  B 
is  constructed  by  conducting  a  right-to-left  scan  of  L.  When  a  comparison  of 
adjacent  keys  reveals  that  a  new  L1  record  has  been  found,  the  record  is  included 
in  B.  When  an  L2  record  is  encountered,  it  is  exchanged  with  the  current  right¬ 
most  element  of  B.  Therefore,  B  begins  with  size  zero  at  the  right  end  of  L 
and  grows  as  we  “roll”  it  toward  the  left  until  it  accumulates  J \/nL1  records, 
now  unordered  by  key.  For  simplicity,  we  assume  that  the  size  of  A,  and  hence 
C,  is  an  integral  multiple  of  j fn.  Figure  1  illustrates  how  this  process  modifies 
our  example  list  of  36  elements. 

It  should  be  clear  that,  at  this  point,  no  more  than  n  key  comparisons 
and  n  record  exchanges  are  ^needed.  Also,  a  couple  of  additional  storage  cells 
are  enough  to  keep  track  of  the  buffer’s  position. 
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Ai  A2  Bi  Ci  C2  Dx  D2  Ds  Ex  E2  Es  E4  Fx  F2  Gt  Hi  lx  I2  Jx  Kx  K2  Mi  M2  Ms  Nt  Ox  02  Pi  Qi  Q2  Rx  R2  Rs  Si  S2  S3 

a)  Example  list  L,  with  a  =  3,  b  =  3  and  n  =  36. 


Ax  A2  Bx  Cx  C2  Dx  D2  Ds  Et  Ea  Es  E4  Fx  F2  Gx  Hi  lx  I2  Jx  Kt  K2  Mi  M2  Ms  Nx  Oi  02  Px  Qi  Q2  Ri  R2  Rs  Sx  S2  S3 


b)  First  buffer  element  is  found. 


Ax  A2  Bi  Ci  C2  Di  D2  Ds  Ex  E2  Es  E4  Fi  F2  Gx  Hi  Ix  I2  Ji  Kj  K2  Mi  M2  Ms  Ni  Oi  02  Pi  Qi  Q2  Ri  Sx  R2  R3  S2  S3 


c)  Second  buffer  element  is  found. 


Ai  A2  Bi  Cx  C2  Di  D2  D3  Ei  E2  E3  E4  Fi  F2  Gi  Hi  Ii  I2  Ji  Ki  K2  Mi  M2  M3  Ni  Oi  Ri  Pi  Qi  Si  02  Q2  R2  R3  S2  S3 

. I.  IMIIII  IN  "  ""  "**''*  — "V*  V  * 

A  B  C 


d)  Buffer  is  filled. 

Fig.  1.  Filling  the  internal  buffer,  B 


The  second  step  of  the  algorithm  is  the  most  complex.  We  transform  ABC 
into  XYC ,  where  X  contains  a  Lx  blocks  (one  of  which  is  B)  and  YC  =  L2. 
This  is  accomplished  as  follows.  B  is  used  to  partition  A  into  a  collection  of 
]/n- sized  L1  and  L2  blocks.  £’ s  initial  position  for  this  step  will  become  the 
rightmost  L2  block  of  Y.  The  j/rc  memory  cells  to  its  immediate  left  will  become 
an  Lt  block.  A  is  scanned  from  right  to  left  until  an  L2  record  is  found,  which 
is  then  exchanged  with  the  buffer’s  tail.  The  scan  is  continued,  each  record 
in  turn  being  exchanged  with  the  rightmost  buffer  element  in  the  appropriate 
block.  Thus  the  buffer  is,  in  general,  broken  into  two  pieces,  each  to  the  left 
of  the  growing  edge  of  a  block  (see  Fig.  2d).  When  an  Lx  block  is  filled,  a 
new  L1  block  is  begun  in  the  set  of  j fn  cells  to  its  immediate  left.  When  an 
L2  block  is  filled,  BLOCKSWAP  is  invoked  if  necessary  to  move  it  to  its  final 
position,  ousting  an  L1  block  there.  Observe  that  handling  L2  blocks  in  this 
manner  insures  that  Y  will  contain  the  appropriate  prefix  (leftmost  sublist)  of 
L2 .  Figure  2  shows  how  such  an  L2  block  is  constructed. 

A  new  L2  block  is  begun  in  the  position  of  the  leftmost  LY  block,  after 
its  c  Lx  records,  0^c<j/n,  are  exchanged  with  the  rightmost  c  records,  all 
from  B,  in  the  block  to  its  immediate  left.  (Therefore  a  new  L2  block  initially 
contains  all  of  £.  See  Fig.  3  b.)  When  all  of  A  has  been  scanned  in  the  manner 
described  above,  every  block  of  X ,  except  for  B,  remains  sorted  internally.  The 
blocks  themselves  may,  however,  be  unordered  with  respect  to  each  other.  Fig¬ 
ure  3  depicts  how  the  transformation  to  XYC  is  completed  on  our  example 
list. 

Therefore,  during  the  second  step  of  the  algorithm,  as  the  sublist  AB  is 
transformed  into  XY  it  takes  on  the  general  form 


B\ ,  Pi >  Qi,  •••,  Qk>  B2,  P2i  Gfc+i  j  » Qh  V 
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Ax  A2  Bi  Ci  C2  Di  D2  Ds  Ex  E2  E3  E4  Fx  F2  Gi  Hi  Ii  I2  Ji  Ki  K2  Mi  M2  Ms  Ni  Oi  Ri  Pi  Qi  Si  02  Q2  R2  Rs  S2  S3 

B  C 


a)  Example  list  after  buffer  is  filled. 


At  A2  Bi  Ci  C2  Di  D2  Ds  Ex  Ea  Es  E*  Fi  F2  Gi  Hi  Ix  I2  Ji  Ki  K2  Mx  M2  St  Nx  Oi  Ri  Pi  Qi  Ms  02  Q2  R2  Rs  S2  Ss 


b)  First  L2  record  is  moved. 


Ai  A2  Bi  Ci  C2  Di  D2  D3  Ei  E2  Es  E4  F1  F2  Gx  Hj  Ii  I2  Ji  Ki  K2  Mi  Qi  Sx  Nt  Oi  Rx  Px  M2  Ms  Oa  Q2  R2  Rs  S2  S3 

B  C 


c)  Second  L2  record  is  moved. 


Ai  A2  Bi  Ci  C2  Dj  D2  Ds  Ei  Ea  Es  E4  Fj  F2  Gx  Hi  Ii  I3  Ji  Kx  K2  Si  Qi  Mj  Nj  Ox  Rx  Px  M2  Ms  02  Q2  R2  Rs  S2  S3 

\7 

B 


d)  First  Lx  record  is  moved. 


Ai  A2  Bx  Ci  C2  Di  D2  Ds  Ex  Es  Es  Ni  Ri  Oi  Si  Qi  Pj  Fx  Gi  Hjli  Jx  Ki  Mi  E4  F2  I2  Ka  M2  Ms  02  Q2  R2  Rs  S2S3 

B  an  Li  block  an  Lj  block  C 


e)  An  Li  block  is  completed  (in  its  final  position). 

Fig.  2.  Constructing  an  L2  block 


where  B1B2  is  the  buffer,  is  a  partially-filled  Lx  block,  P2  is  a  partially-filled 
L2  block,  each  Q}  is  an  Lj  block  of  size  ]/n ,  and  V  consists  solely  of  L2  records. 
The  size  of  B2P2  is  exactly  ]/n.  The  basic  operation  is  to  swap  the  last  record 
of  U  with  the  last  record  of  B^  or  B2,  depending  on  whether  it  is  an  Lx  or 
an  L2  record.  This  causes  U  and  Bx  or  B2  to  shrink  on  the  right  by  one  element 
and  the  blocks  to  their  immediate  right  to  grow  accordingly.  If  becomes 
full  (i.e.,  contains  | fn  records),  it  is  relabelled  as  a  Q  block  and  a  new  empty 
P1  block  is  placed  immediately  to  its  left.  This  basic  operation  is  repeated  until 
B2  becomes  empty,  at  which  point  P2  is  a  complete  L2  block.  P2  is  then  swapped 
with  Qt,  placing  it  adjacent  to  V,  and  yielding  the  configuration 


U,Bl9P1,Q1,...,Qk,Ql,Qk+1,...,Ql-1,P2,V- 


To  continue,  the  Q  blocks  are  renumbered,  and  the  old  P2  V  sublist  is  rela¬ 
belled  as  V.  The  entire  buffer  is  now  in  Blf  but  it  is  not  necessarily  aligned 
on  a  block  boundary  since,  in  general,  P1  is  shorter  than  J fn.  To  align  the 
buffer.  Pi  is  swapped  with  the  first  part  of  Bx  giving 


^1  j  B2,  Qi , fit,  6fc+i>  •••>  Qu  V 

where  B2,  the  new  buffer,  is  a  rotation  of  the  old  Bx.  The  induction  continues 
by  placing  an  empty  Bx  in  front  of  Px  and  an  empty  P2  behind  P2,  again 
producing  the  general  form  (now  with  k  =  0) 

U9Bl9Pl9B29P29Ql9...9Ql9V 
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Aj  A2  Bj  Cj  C2  Di  D2  Dj  Ei  E2  Es  Ni  Ri  Oi  Si  Qi  Pi  Pi  Gi  Hi  Ii  Ji  Ki  Mi  E4  F2 I2  K2  M2  Mg  O2  Q2  R2  Rs  S2  Ss 


B  an  Li  block  an  La  block 

a)  Example  list  after  £2  block  is  completed. 


Ai  A2  Bi  Ci  C2  Di  D2  Dg  Ei  E2  Eg  Fi  Ri  Oi  Si  Qi  Pi  Ni  Gi  Hi  Ii  Ji  Ki  Mi  E4  F2 12  Kg  M2Mg  O2  Q2  R2  B3  S2  S3 

^  ^  . ^  ^  ^  N  ^  ^  s  s/  ' 

B  an  Li  block  an  La  block  C 

b)  An  Li  record  is  moved  so  that  another  £2  block  can  be  started. 


Ai  Ri  Qi  Ni  Oi  Pi  Si  Bi  Ci  Di  Ei  Fi  A2  C2  D2  Dg  E2  Eg  Gi  Hi  Ii  Ji  Ki  Mi  E4  F2I2  K2  M2  Mg  O2  Q2  R2  R3  S2  Sg 

B  an  L2  block  an Li  block  an  £>3  block  C 

c)  Next  £2  block  is  completed. 


Ai  Ri  Qi  Ni  Oi  Pi  Si  Bi  Ci  Di  Ei  Fi  Gi  Hi  Ii  Ji  Ki  Mi  A2  C2  D2  Dg  E2  Eg  E4  F2 12  Kg  M2  M3  O2  Q2  R2  Rs  S2  S3 

B  anLi  block  an  La  block  anLa  block  C 

d)  £2  block  is  moved  to  final  position. 


Ai  Bi  Ci  Di  Ei  Fi  Ri  Qi  Ni  Oi  Pi  Si  Gi  Hi  Ii  Ji  Ki  Mi  A2  C2  D2  Dg  Eg  Eg  E4  F2 12  K2  M2  M3  O2  Q2  R2  R8  S2  Sg 

»n£]  block  B  an  Li  block  anLa  block  an  La  block  C 

X  Y 

e)  £1  records  are  moved.  £2  sublist  is  completed. 

Fig.  3.  Finishing  the  L2  sublist 

When  U  is  exhausted,  Pt  and  P2  will  both  be  empty  under  our  assumption 
that  the  numbers  of  Lx  and  L2  records  are  both  multiples  of  ]/n. 

At  most  n  comparisons  are  needed  to  compare  adjacent  keys  in  this  second 
step.  0(n)  time  suffices  for  record  movement  as  well,  since  there  are  at  most 
n  record  exchanges  and  at  most  2  J fn  block  exchanges.  Only  a  small,  constant 
number  of  additional  storage  cells  are  needed  for  counters  and  pointers. 

The  final  step  of  the  algorithm  is  to  transform  X  into  To  do  this,  B 
is  first  sorted  (recall  that  all  keys  in  B  are  distinct,  insuring  stability).  Then 
BLOCKSORT  is  used  to  sort  the  blocks  of  X  by  their  heads.  Since  these  blocks 
were  constructed  one  at  a  time,  and  since  each  is  sorted  internally,  L1L2  is 
the  final  result.  See  Fig.  4. 

0(n)  time  and  0(1)  space  are  sufficient  for  this  final  step,  since  each  sort 
involves  at  most  |/n  keys. 


5.  Implementation  Details 

We  now  describe  the  necessary  implementation  details  that  permit  us  to  perform 
duplicate-key  extraction  without  regard  to  any  assumptions  about  actual  list, 
sublist  or  block  sizes.  Let  s  =  |_|/wj  denote  the  size  we  will  use  for  a  block. 
Thus  the  number  of  Ll  records  is  stA  +el ,  for  some  unique  nonnegative  integers 
t1  and  el9  where  ex<s.  Similarly,  the  number  of  L2  records  can  be  denoted 
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Ai  Bt  Ci  Pi  Ei  Fi  Ri  Qi  Nx  Ox  Pi  Si  Gt  Hi  Ix  Ji  Kx  A2  C2  P2  Ds  Ea  Es  E4  F2 I2  K2  M2  M3  02  Q2  R2  Rs  S2  S3 

an  Li  block  B  on  Li  block  La 

a)  Example  Hat  after  L2  sublist  completed. 


Ax  Bi  Cx  Di  Ei  Fi  Ni  Ox  Pr  Qx  Rx  Si  Gx  Hi  h  Ji  Kx  Mx  A2  C2  D2  Ds  E2  Es  E4  F2 12  K2  M2  Ms  02  Q2  R2  Rs  S2  Ss 

^  -  - v  >  *■  ^  V  ■  »y-  *  >n  1  ■"  "V  ■  —  >■■* .  ^ 

an  Lj  block  B  on  Lj  block  La 

b)  B  is  sorted. 


Ax  Bx  Cx  Di  Ei  Fi  Gx  Hx  Ii  Ji  Kx  Mi  Nj  Ox  Pi  Qx  Rx  Si  A2  C2  D2  Ds  E2  Es  E*  F2  I2  K2  M2  Ms  02  Q2  R2  Rs  S2  Ss 

c)  BLOCKSORT  ia  performed  on  blocks  of  X.  Lx  subliat  is  completed. 

Fig.  4.  Finishing  the  Lx  sublist 

by  the  expression  st2  +  e2,  e2<s.  Note  that  (s  +  l)2>n^.st1+st2  guarantees 
that  the  total  number  of  s- sized  blocks,  tt  +  t2,  is  no  more  than  s  +  2.  We  now 
consider  the  file  as  a  collection  of  t1  +  t2  +  2  blocks,  the  first  a  (possibly  empty) 
block  of  size  ex ,  followed  by  the  t±  + 12  s-sized  blocks,  followed  by  a  final  (possibly 
empty)  block  of  size  e2 . 

First,  we  attempt  to  fill  the  internal  buffer.  Observe  that  in  doing  so  we 
may  have  examined  all  L2  records,  in  which  case  we  merely  sort  the  (perhaps 
only  partially  filled)  buffer  and  halt.  Otherwise,  we  scan  the  remainder  of  L 
to  determine  the  number  of  Li  records,  from  which  we  derive  the  value  of 
ex.  The  block  containing  the  rightmost  buffer  element  becomes  the  first  L2 
block,  the  block  to  its  immediate  left  the  first  block.  We  are  thus  either 
finished  with  the  last  block,  which  is  of  size  e2 ,  or  soon  will  be.  Its  records 
will  end  up  in  their  final  position;  its  unusual  size  can  cause  no  problem. 

We  next  initiate  the  second  step  of  the  algorithm  as  outlined  in  the  previous 
section.  When  the  last  L2  block  has  been  filled  and  moved,  we  know  that  there 
is  no  need  to  continue  scanning  A.  Note  that,  with  the  final  exchange  of  fewer 
than  s  Lj  and  buffer  records,  the  buffer  now  occupies  one  full  block,  ensuring 
that  we  are  finished  with  the  first  block  (which  has  the  unusual  size  ex)  as 
well  as  any  other  block  to  the  left  of  the  buffer. 

We  complete  the  job  of  duplicate-key  extraction  by  sorting  and  moving 
the  buffer  to  its  final  position,  then  invoking  BLOCKSORT  on  the  Lx  blocks 
that  may  need  rearrangement.  Hence  the  following  detailed  description  of  algo¬ 
rithm  EXTRACT  and  its  two  subprograms  BUFFERFILL  and  BLOCKIFY. 
Our  original  (and,  we  confess,  rather  abstruse)  description  in  pidgin  Algol  [HL 1] 
is  herewith  replaced  with  the  lucid  and  well-commented  Pascal  code  provided 
by  Mike  Fischer. 

6.  Constant  of  Proportionality  Bounds 

We  now  bound  the  worst-case  number  of  key  comparisons  and  record  exchanges 
performed  by  EXTRACT.  Exactly  n—  1  comparisons  are  employed  to  extract 
the  buffer  and  to  count  the  number  of  Ll  records.  BLOCKIFY  requires  no 
more  than  n— s—  1  comparisons,  all  within  its  while  loop.  Our  selection  sort 
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PROCEDURE  extract; 
VAR 

s:  integer; 
buffer:  integer; 
size:  integer; 
new:  integer; 
count:  integer; 
i:  integer; 
BEGIN 


{  block  size  ) 

[  pointer  to  leftmost  buffer  element  } 

{  number  of  records  in  the  buffer  } 

[  pointer  to  next  record  to  be  scanned  ) 
{  total  number  of  Ll  records  } 

{  loop  index  } 


{  compute  block  size  } 
s  :«  trunc(sqrt(n)); 


{  create  buffer  } 
bufferfill(s,  buffer,  size); 


IF  buffer  -  2  THEN 
sort (2,  size) 

ELSE  BEGIN  (  buffer  size  is  s  ) 
(  initialize  scanner  ) 
new  buffer  -  1; 


{  count  the  number  of  Ll  records) 
count  :-  s  +  1; 

FOR  i  2  TO  new  DO 

IF  key(i-l)  <  key(i)  THEN  count  count  +  1; 

{  process  remaining  records  into  Ll  blocks  } 
buffer  :«  blockify(new,  count,  s); 


{  sort  buffer  ) 
sort (buffer,  s); 

{  put  buffer  in  its  proper  place  ) 
blockswap( buffer,  count-s+1,  s); 

{  sort  the  Ll  blocks  to  right  of  buffer  pointer  ) 
blocksort( buffer,  s,  (count+l-buffer)  div  s  -  ID- 

END; 

END; 


(  Fills  buffer  with  s  records,  if  possible.  ) 

PROCEDURE  bufferfill( 

s  :  integer;  (  size  of  block  ) 

VAR  buffer  :  integer;  [  returns  pointer  to  left  end  of  buffer  ) 

VAR  size  :  integer  {  returns  size  of  buffer  ] 

)  • 

BEGIN 

buffer  n+1; 
size  0; 

REPEAT 

buffer  buffer  -  1; 

IF  key(buffer-l)  -  key(buffer)  THEN 
[  move  L2  record  to  proper  place  } 
swap(buffer,  buffer+size) 

ELSE 

[  include  Ll  record  in  buffer  ) 
size  size  +  1; 

UNTIL  (buffer  -  2)  or  (size  -  s); 

END; 


implementation  of  SORT  can  direct  at  most  }s(s—  1)  comparisons  [Kn].  Similar¬ 
ly,  BLOCKSORT  needs  no  more  than  J(£x  —  l)(fi  — 2)gi(s  +  l)(s)  comparisons. 
Therefore,  the  total  number  of  key  comparisons  is  at  most  {n-  l)+(n  — s  — 1) 
+  is(s  —  l)  +  i(s+  l)s  =  2n  +  s2— s— 2^3  n  —  s  —  2<3n. 
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{  Processes  remaining  records.  ) 

FUNCTION  blockify( 

new  :  integer;  {  pointer  to  rightmost  unprocessed  record  ) 


count  :  integer; 
s  :  integer 
)  :  integer; 

VAR 


{  number  of  Ll-type  records  in  list  } 
{  block  size  } 

{  returns  pointer  to  buffer  } 


ptrl  :  integer; 
ptr2  :  integer; 
L2_block  :  integer; 
el:  integer; 


{  place  for  next  element  in  current  Ll  block  J 
{  place  for  next  element  in  current  L2  block  J 
[  pointer  to  position  of  next  L2  block  } 

{  size  of  first  (partial)  block  ) 


{  Returns  a  pointer  to  left  end  of  block  containing  element  i.  ) 
FUNCTION  blockhead(i:  integer)  :  integer; 

BEGIN 

blockhead  :«  i  -  (s  +  i  -  el  -  1)  mod  s; 

END; 


BEGIN  {*  body  of  block! fy  *) 

(  set  size  of  first  block  ) 
el  count  mod  s; 

L2_block  :=-  blockhead  (new  +  s); 

ptrl  : “  L2_block  -  1; 
ptr2  new  +  s; 

{  fill  each  L2  block  in  turn  } 


(  size  of  first  (partial)  block  ) 

{  where  next  L2  block  goes  } 
f  points  into  current  Ll  block  J 
[  points  into  current  L2  block  } 


WHILE  L2  block  >*=  count+1  DO  BEGIN 


[  fill  current  L2  block  ] 

REPEAT 

(  move  past  any  Ll  records  ) 

WHILE  key(new-l)  <  key (new)  DO  BEGIN 
swap ( new,  ptrl ) ; 
new  ;*  new  -  1; 
ptrl  ptrl  -  1; 

END; 

(  found  L2  element,  so  put  in  current  L2  block  ) 
swap (new,  ptr2); 
new  new  -  1; 
ptr2  ptr2  -  1; 

UNTIL  ptr2  +  1  -  blockhead ( ptr2  +  1); 

(  put  current  L2  block  into  proper  place  at  position  L2_block  ) 
blockswap (ptr2  +  1,  L2_block,  s); 

{  new  Ll  block  is  block  containing  new  ] 
ptrl  new; 

{  new  L2  block  is  next  block  to  its  right  ] 
ptr2  blockhead (new)  +  2*s  -  1; 

{  swap  Ll-type  elements  out  of  new  L2  block  ) 
blockswap ( new  +  1,  new  +  s  +  1,  ptr2  -  new  -  s); 

{  point  to  eventual  destination  of  new  L2  block  } 

L2_block  L2_block  ~  s; 

END;  (*  while  *) 

blockify  ptr2  -  s  +  1;  {  return  pointer  to  buffer  } 

END; 


As  for  record  exchanges,  SWAP  is  invoked  at  most  n—s—  1  times.  Each 
BLOCKSWAP  within  BLOCKIFY  is  called  at  most  t2  times,  always  with 
block  size  less  than  or  equal  to  s.  The  main  algorithm’s  final  SORT  requires 
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s  —  1  exchanges,  followed  by  BLOCKSWAP  needing  s.  Finally,  BLOCKSORT 
uses  s(t1-2)  exchanges.  Therefore,  the  total  number  of  record  exchanges  is 
at  most  (n— s—  l)  +  2st2  +  (s  —  lj  +  s  +  sfo—  2)  =  n  +  s(t1  +  t2)  +  st2  —  s  —  2^3n 
—  s  —  2<3n. 

Since  key  comparisons  and  record  exchanges  are  likely  to  be  by  far  the 
most  time  consuming  operations  for  EXTRACT,  and  since  they  are  both  storage- 
to-storage  type  instructions  for  most  architectures,  we  conclude  that  6n  is  a 
reasonable  estimate  of  the  worst-case  constant  of  proportionality  for  this  algo¬ 
rithm’s  0(n)  time  complexity.  As  for  0(1)  space,  a  review  of  our  code  reveals 
that  we  have  explicitly  used  but  10  additional  storage  cells  for  pointers,  counters 
and  the  like. 

Incidentally,  we  have  conducted  a  series  of  experiments  to  compare  EX¬ 
TRACT'S  average-case  behavior  to  that  of  a  naive  but  efficient  algorithm  free 
to  exploit  the  temporary  use  of  0{n)  extra  memory,  and  found  that  the  expected 
penalty  for  performing  duplicate-key  extraction  in  place  is  less  than  a  quadru¬ 
pling  of  program  execution  times.  At  the  editor’s  suggestion,  however,  we  have 
omitted  the  presentation  of  our  experimental  findings  from  this  paper.  The  inter¬ 
ested  reader  is  referred  to  [HL1]. 


7.  Conclusions 

We  have  devised  an  algorithm  that  performs  stable  duplicate-key  extraction 
in  linear  time  and  constant  space,  and  is  therefore  optimal  to  within  a  constant 
factor.  We  have  also  bounded  its  worst-case  number  of  key  comparisons  and 
record  exchanges  to  indicate  that  it  is  practical. 

This  algorithm  could  be  especially  useful  when  viewed  as  an  efficient  means 
for  increasing  the  effective  size  of  internal  memory  when  performing  duplicate- 
key  extraction  on  a  much  larger  external  file.  This  tends  to  decrease  the  number 
of  input/output  operations  needed,  thereby  dramatically  reducing  the  overall 
execution  time.  Similarly,  this  algorithm  could  be  employed  to  great  advantage 
when  managing  critical  resources  such  as  cache  memory  or  other  relatively 
small,  high-speed  memory  components. 
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The  classical  bin  packing  problem  is  one  of  the  best-known  and  most  widely  studied  problems 
of  combinatorial  optimization.  Efficient  offline  approximation  algorithms  have  recently  been 
designed  and  analyzed  for  the  more  general  and  realistic  model  in  which  bins  of  differing 
capacities  are  allowed  (Friesen  and  Langston  (1986)).  In  this  paper,  we  consider  fast  online 
algorithms  for  this  challenging  model.  Selecting  either  the  smallest  or  the  largest  available  bin  size 
to  begin  a  new  bin  as  pieces  arrive  turns  out  to  yield  a  tight  worst-case  ratio  of  2.  We  devise  a 
slightly  more  complicated  scheme  that  uses  the  largest  available  bin  size  for  small  pieces,  and 
selects  bin  sizes  for  large  pieces  based  on  a  user-specified  fill  factor />  1/2,  and  prove  that  this 
strategy  guarantees  a  worst-case  bound  not  exceeding  1.5+//2. 


1.  Introduction 

In  the  classical  bin  packing  problem,  the  objective  is  to  pack  a  list  of  n  pieces 
p=  (plfp2, each  with  a  size  in  the  range  (0, 1],  into  the  minimum  number 
of  unit-capacity  bins.  The  general  significance  of  this  NP-complete  problem  is 
reflected  in  the  great  attention  it  has  received  in  the  literature  (see  [2]  for  an  updated 
survey). 

Recently,  important  generalizations  of  the  bin  packing  problem  have  been 
investigated  [3,  4]  in  which  bin  capacities  may  vary.  In  particular,  the  model  of  [4] 
permits  a  fixed  collection  of  bin  sizes,  where  the  objective  is  to  minimize  the  total 
space  of  the  packing.  This  model  is  considerably  more  realistic  than  that  of  the 
classical  problem.  (We  observe,  for  example,  that  the  classical  problem  corresponds 
to  a  lumber  yard  that  sells  2x4s  in  8-foot  lengths  only!)  In  [4],  some  practical, 
offline  algorithms  for  variable-sized  bin  packing  were  designed  and  analyzed.  The 
most  complicated  of  those,  termed  FFDLS,  was  proved  always  to  produce  a  packing 
whose  total  space  is  asymptotically  bounded  by  y  times  the  optimum.  Also,  from 
a  more  purely  theoretical  standpoint,  an  offline  fully  polynomial-time  approxi¬ 
mation  scheme  has  very  recently  been  devised  in  [11]  using  a  linear  programming 
formulation  of  the  problem. 

In  this  paper,  we  explore  the  worst-case  behavior  of  fast,  online  variable-sized  bin 
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packing  schemes.  An  online  algorithm  cannot  preview  and  rearrange  the  elements 
of  P  before  it  starts  to  construct  a  packing,  but  must  instead  accept  and  immediately 
pack  each  piece  as  it  arrives.  A  number  of  online  strategies  have  been  proposed  and 
analyzed  for  the  classical  problem.  See,  for  example,  [9,  10,  12,  13].  What  makes 
our  problem  even  more  difficult  is  that  whenever  an  online  algorithm  decides  to 
begin  a  new  bin,  it  must  also  select  the  size  of  that  bin,  and  cannot  go  back  later 
to  repack  or  consolidate  bins. 


2.  Notation 

Let  k  denote  the  number  of  distinct  bin  sizes  available,  where  there  is  an  unlimited 
supply  of  bins  of  each  size.  We  normalize  bin  and  piece  sizes  so  that  the  largest 
bin  is  of  size  1  (and  thus  the  size  of  the  largest  piece  cannot  exceed  1).  Let  B  = 
(B{,  B2, ...,  B{)  denote  the  ordered  list  of  /  bins  containing  P  as  packed  by  an 
online  algorithm,  ALG.  We  use  B*  for  the  corresponding  optimum  packing  with 
m  bins.  We  employ  the  function  s  to  specify  bin  and  piece  sizes,  and  use  the  function 
c  to  specify  the  total  contents  of  a  bin.  For  example,  s(j?i)  denotes  the  size  of  the 
first  piece  and  c(B{)  denotes  the  cumulative  size  of  all  pieces  ALG  packs  in  its  first 
bin.  Finally,  given  an  instance  7  of  variable-sized  bin  packing,  we  use  ALG(7)  and 
OPT(7)  to  denote  the  values  and  $(£*),  respectively. 


3.  Some  simple  algorithms 

One  option  for  an  online  algorithm  is  simply  to  begin  a  new  bin  whenever  the  next 
available  piece  will  not  fit  into  the  current  bin.  If  bins  of  size  1  are  always  used, 
this  0(«)-time  scheme  is  called  NFL  (Next  Fit,  using  Largest  possible  bins).  The 
following  result  is  from  [4]  and  is  reproduced  here  for  the  purpose  of  illustration. 

Theorem  3.1.  NFL(7)  <  2  •  OPT(7)  -I- 1  for  any  instance  7. 


Proof.  For  !</</,  dfii)  +  c(Bi+ 1 )  >  1 .  Therefore 


and 


£  <#>,)>«/- 1) 

i=i 


NFL(/)  =  (/-l)+l<2  £  c{Bi)  +  1 

<=i 


=  2  £  C(B*)  +  1  <  2  £  s(B,*)  +1=2-  OPT  (/)+!.  □ 
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Any  packing  instance  consisting  of  pieces  of  size  y  +  e  and  bins  of  sizes  1  and  y  +  e, 
for  some  arbitrarily  small  e>0,  demonstrates  that  the  bound  of  2  is  asymptotically 
tight  for  NFL. 

Another  alternative  is  to  review  all  partially  packed  bins,  placing  the  next 
available  piece  in  the  first  bin  with  room  for  it,  beginning  a  new  bin  only  when 
necessary.  If  bins  of  size  1  are  always  used,  we  denote  this  approach  by  FFL  (First 
Fit,  using  Largest  possible  bins).  By  efficiently  conducting  the  review  of  partially 
packed  bins  [6],  FFL  can  be  implemented  to  run  in  O (n  log«)  time. 

Theorem  3,2,  FFL(7)  <  2  •  OPT(7)  +  1  for  any  instance  I. 

Proof.  Use  the  same  series  of  arguments  presented  in  the  proof  of  Theorem  3.1.  □ 

While  the  worst-case  behavior  of  First  Fit  is  superior  to  that  of  Next  Fit  for  the 
classical  problem  [2,  5],  this  is  not  the  case  for  NFL  and  FFL  when  applied  to 
variable-sized  bin  packing.  In  fact,  any  online  algorithm  that  uses  the  largest 
possible  bins  will  produce  the  same  packing  when  presented  with  a  troublesome 
instance  such  as  the  one  described  immediately  following  the  proof  of  Theorem  3.1. 

Given  the  egregious  behavior  resulting  from  the  use  of  large  bins,  we  next 
consider  FFS  (First  Fit,  using  Smallest  possible  bins),  of  time  complexity 
O (n  log  n  +  n  log  k). 

Theorem  3.3.  FFS(7)<2*OPT(7)  +  1  for  any  instance  I. 

Proof.  Since  First  Fit  is  used,  c(Bl)  +  c(Bl)>s(Bl).  Also,  c(B/)  +  c(JS/+1)>5(5i)  for 
l</</.  Therefore, 

FFS(/)=  E  s(Bi)<s(Bi)+  ZsW+siB,) 

/= 1  7=  1 

/ 

<2  I  c(5,)  +  s(B,) 

7-1 

<2  I  c(Bj)  + 1 

i- 1 

m  m 

=  2  £  c(B*)  +  1  <  2  £  s(Pt*)+  1 

,=i  /=i 

=  2-OPT(/)  +  l.  □ 

Unfortunately,  FFS  performs  no  better  in  the  worst  case  than  does  NFL  or  FFL, 
since  any  packing  instance  consisting  of  pieces  of  size  y  and  bins  of  sizes  1  and  1  -  e, 
for  some  arbitrarily  small  e>0,  demonstrates  that  the  bound  of  2  is  asymptotically 
tight  for  FFS. 
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4.  Main  result 

We  observe  that  FFL  errs  in  its  packing  of  “large”  pieces  (those  with  size 
exceeding  j),  while  FFS  errs  in  its  packing  of  “small”  pieces  (those  with  size  less 
than  or  equal  to  }).  Therefore,  we  now  focus  our  attention  on  a  hybrid  approach 
[7]  that  we  shall  denote  by  FFf.  Let  /  denote  a  user-specified  fill  factor  in  the  range 
[{,  1].  Suppose  FFf  must  start  a  new  bin  using  a  piece  ph  If  pt  is  a  small  piece,  then 
FFf  starts  a  new  bin  of  size  1.  If  pt  is  a  large  piece,  then  FFf  selects  the  smallest  bin 
size  in  the  range  [s(Pi\s(Pi)/f]  if  such  a  size  exists,  else  it  uses  bin  size  1.  For 
example,  if  the  fill  factor  is  }  and  a  piece,  ph  needs  a  new  bin,  then  FFf  will  select 
a  unit-capacity  bin  if  and  only  if  either  s(Pi)<\  or  there  is  no  bin  with  size  less 
than  1  available  that  pt  can  fill  at  least  \  full. 

Theorem  4.1.  FFf (7)  <  (1 .5  +  !/)  •  OPT(7)  +  2  for  any  instance  7. 

Proof.  Given  an  arbitrary  instance,  7,  we  classify  bins  of  the  FFf  packing  as  follows: 
a  bin  of  type  X  has  size  1  and  contains  a  single  piece;  a  bin  of  type  Y  has  size  1 
and  contains  two  or  more  pieces;  a  bin  of  type  Z  has  size  less  than  1. 

We  deviate  from  this  classification  for  at  most  two  “exceptional”  bins.  Every  bin 
of  type  X ,  except  at  most  one,  must  contain  a  large  piece.  (To  see  this,  observe  that 
if  a  bin  of  type  X  contains  a  small  piece,  then  every  subsequent  bin  of  type  X  must 
contain  a  large  piece.)  If  there  is  a  bin  of  type  X  that  contains  a  small  piece,  then 
we  change  its  classification  from  type  X  to  exceptional.  Similarly,  every  bin  of  type 
Y,  except  at  most  one,  must  be  more  than  }  full.  (To  see  this,  observe  that  if  a  bin 
of  type  Y  is  at  most }  full,  then  every  subsequent  bin  of  type  Y  must  contain  at  least 
two  pieces,  each  of  size  greater  than  y.)  If  there  is  a  bin  of  type  Y  that  is  at  most 
}  full,  then  we  change  its  classification  from  type  Y  to  exceptional. 

Let  x  and  y  denote  the  number  of  bins  of  types  X  and  Y9  respectively.  Let  z  denote 
the  sum  of  the  sizes  of  all  bins  of  type  Z.  Thus  FFf(7) =x+y  +  z  +  s’  (any  exceptional 
bins)  and  we  have 

FFf(7)-2<*+.y  +  z.  (1) 

We  now  obtain  two  distinct  lower  bounds  for  OPT(7).  For  the  first,  we  consider 
the  bin  sizes  available.  Since  every  bin  of  type  X  or  Z  contains  a  piece  whose  size 
exceeds  y,  no  two  such  pieces  can  share  one  bin  in  an  optimal  packing  of  7.  From 
the  way  FFf  selects  a  new  bin  for  a  large  piece  when  one  is  required,  a  piece,  ph 
from  a  bin  of  type  X  requires  a  bin  of  size  at  least  min{l,s(p,)//}>  1/(2 f)  in  an 
optimal  packing.  Also,  any  large  piece  used  by  FFf  to  begin  a  bin  of  type  Z  is  packed 
as  tightly  as  possible.  Therefore,  we  have 

OPT  (7)  >  jx/f  +  z.  (2) 

For  the  second  lower  bound,  we  consider  the  total  size  of  all  pieces.  The  contents 
of  type  X  bins  sum  to  more  than  \x.  The  contents  of  type  Y  bins  sum  to  more  than 
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}j V.  The  contents  of  type  Z  bins  sum  to  at  least  fz .  Thus,  since  x+y>0,  we  have 
OPT  (!)>\x+\y+fz.  (3) 

Let  R  denote  (FFf(7)-2)/OPT(7).  From  (1)  and  (2)  we  know 
R  <  (x+y  +  z)/(  Wf+  z) 
from  which  we  derive 

y>iRx/f+Rz-x~z.  (4) 

From  (1)  and  (3)  we  know 

R<(x+y  +  z)/(}x+iy+fz) 

in  which  we  substitute  the  lower  bound  for  y  from  (4),  since  it  is  known  [1,8] 
that,  even  for  unit-capacity  bins,  any  online  algorithm’s  worst-case  ratio  exceeds 
1.536>y.  Thus  we  derive 

R  <  Ox +fx  +/z(10  -  6f))/(2x  +  4  fz) 

which  is  bounded  above  by  j(3x  +  fx)/x=  1.5  +  y/as  long  as  R  is  bounded  below  by 
-J-.  Therefore, 

FFf(7)  =  R  •  OPT(7)  +  2  <  (1 .5  +  }/)  ■  OPT(7)  +  2.  □ 


5.  Discussion 

Surprisingly,  we  conclude  from  Theorem  4. 1  that  the  simplest  variant  of  FFf  may 
be  the  best  (in  the  worst-case  sense).  By  setting /=  0.5,  a  small  piece  needing  a  new 
bin  always  gets  the  largest  bin  available  while  a  large  piece  needing  a  new  bin  always 
gets  the  smallest  bin  that  can  contain  it.  Let  FFH  denote  this  limiting-case  hybrid. 

Corollary  5.1.  FFH(7)<  1.75-OPT(7)  +  2/or  any  instance  7. 

For  the  classical  problem,  examples  exist  [5]  to  demonstrate  that  FF(7)  can 
approach  arbitrarily  close  to  1.7-OPT(7)  from  below.  Therefore,  of  course,  the 
same  holds  for  FFH  under  the  packing  model  addressed  herein. 

The  determination  of  just  how  tight  the  bounds  given  in  Theorem  4.1  are  is  as 
yet  an  open  issue.  (Slightly  more  complicated  arguments  easily  reduce  the  additive 
constant  from  2  to  1.)  However,  examples  such  as  the  one  that  follows  demonstrate 
that  Corollary  5.1  does  not  extend  to  arbitrary  values  of/,  and  that  the  guarantee 
of  Theorem  4.1  is  indeed  dependent  on  /.  Let  n  be  evenly  divisible  by  4.  Suppose 
that  P  contains  \n  pieces  of  size  j  +  e  followed  by  y n  pieces  of  size  y  +  e,  and  that 
bin  sizes  are  ~+2e  and  1  for  some  arbitrarily  small  e>0.  Thus,  for /=  0.6,  FFf(7) 
can  exceed  any  value  strictly  less  than  1.8*OPT(7). 
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Finally,  let  the  parameter  q  denote  an  integer  greater  than  1  and  suppose 
s(pi)<\/q  for  1  </<n.  (In  this  case,  FFf  reduces  to  FFL.)  As  in  [5,  Theorem  2.3] 
we  note  that  FFL  insures  the  inequality  c(Bi)>q/(q+  1)  for  all  but  at  most  two 
values  of  /  in  the  range  [1,/].  Therefore,  in  this  parameterized  environment,  there 
is  no  worst-case  penalty  for  variable-sized  bin  packing.  That  is,  we  have 

FFL(7)  <  (0 q  +  1  )/q)  •  OPT(7)  +  2, 

the  same  as  for  the  classical  problem.  However,  the  freedom  to  choose  bin  (and 
piece)  sizes  of  l/(#+ 1)  +  £,  for  arbitrarily  small  e>0,  greatly  simplifies  the  job  of 
establishing  the  asymptotic  tightness  of  this  parameterized  bound. 
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We  study  the  problem  of  resource  sharing  within  a  system  of  users,  each  with  the  same  resource 
capacity,  but  with  varying  resource  demands.  For  model  simplicity,  we  assume  system  saturation, 
so  that  the  total  demand  matches  the  total  capacity,  and  permit  only  a  limited  form  of  sharing 
in  which  a  user  is  free  to  share  its  unused  capacity  with  exactly  one  other  user.  We  seek  to 
maximize  the  total  amount  of  unshared  capacity  over  all  feasible  solutions,  reflecting  an  environ¬ 
ment  in  which  sharing  incurs  a  cost  proportional  to  the  overall  quantity  shared.  For  the  general 
problem,  which  is  NP-hard,  we  derive  a  tight  worst-case  performance  bound  for  a  greedy  algo¬ 
rithm  G  as  well  as  for  a  number  of  other  sharing  rules.  We  also  prove  several  results  concerning 
G’s  behavior  in  more  restricted  settings. 

Keywords.  Resource  allocation,  computational  complexity,  approximation  algorithms,  com¬ 
binatorial  optimization. 


1.  Introduction 

Consider  a  collection  of  n  users,  each  with  an  identical  resource  capacity  C,  and 
each  with  a  specific  resource  demand  dh  1  <i<n.  The  simplest  case,  and  the  one 
on  which  we  focus  our  attention,  assumes  that  the  system  is  saturated,  so  that 
L/=i  dt=nC.  A  user  whose  demand  is  less  than  C  is  permitted  to  share  its  excess 
capacity  with  one  whose  demand  exceeds  C.  We  seek  to  redistribute  such  excess 
within  the  system  so  as  to  maximize  the  total  capacity  unshared,  modeling  an  environ¬ 
ment  in  which  the  cost  of  sharing  is  directly  proportional  to  the  overall  quantity 
shared. 

If  the  number  of  users  allowed  to  share  one  user’s  excess  is  unlimited,  then  this 
problem  can  be  easily  solved  in  polynomial  time  and  is  left  to  the  reader  as  an  exer¬ 
cise.  If  a  bound  is  placed  on  the  number  of  users  who  may  simultaneously  share  one 
user’s  excess,  then  the  problem  is  NP-hard.  The  extreme  case,  and  the  one  we  ex¬ 
plore  here,  permits  a  very  limited  form  of  sharing,  in  which  only  one  user  is  permitted 

*  This  author’s  research  has  been  supported  in  part  by  the  National  Science  Foundation  under  grants 
ECS-8403859,  MIP-8603879  and  MIP-8919312,  and  by  the  Office  of  Naval  Research  under  contract 
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to  share  another’s  excess.  (This  limit  applies  to  the  lender’s  excess,  not  the  borrower’s 
demand.  A  user  with  a  great  demand  may  have  to  borrow  from  several  other  users. 
Moreover,  a  borrower  whose  demand  is  strictly  less  than  C  plus  the  capacity  he  must 
borrow  subsequently  becomes  a  lender  himself.)  As  we  shall  show,  even  this  primi¬ 
tive  formulation  is  combinatorially  rich  and  exceedingly  difficult  to  optimize. 

This  model  can  be  interpreted  in  several  ways.  It  was  first  brought  to  our  attention 
as  a  problem  of  efficiently  generating  random  choices  from  a  finite  set  [5],  Here, 
the  well-known  “square-the-histogram”  method  corresponds  to  the  limited  sharing 
problem  just  described,  with  the  objective  to  maximize  the  expected  number  of 
direct  memory  accesses.  (A  uniformly  generated  pseudo-random  number  that  falls 
in  a  shared  region  corresponds  to  a  unique  but  secondary  choice,  thus  requiring  an 
indirect  memory  fetch.  See  [9, 10]  for  more  details.)  One  can  also  visualize  the  model 
as  representative  of  a  distributed  computing  environment  in  which  the  local  memory 
of  a  processing  element  can  share  unused  storage  with  other  elements.  Limited  I/O 
porting,  communications  overhead,  memory  addressing  restrictions  (e.g.,  bounds 
registers)  and  a  host  of  other  hardware  and  software  limitations  can  severely  inhibit 
sharing.  Our  problem  can  even  be  viewed  as  one  of  resource  balancing,  a  one¬ 
dimensional  analog  of  the  problem  of  evenly  distributing  goods  among  a  collection 
of  warehouses,  where  transportation  costs  or  other  factors  dictate  that  excess  space 
must  be  occupied  by  goods  coming  from  only  one  other  warehouse.  Perhaps  the 
most  superficially  similar,  previously-studied  problem  is  that  of  variable  sized  bin 
packing  [2] ,  where  a  bin  is  akin  to  a  demand  exceeding  C,  and  a  piece  to  be  packed 
is  much  like  a  user’s  excess  capacity. 

The  remainder  of  this  paper  is  organized  as  follows.  In  the  next  section,  we  in¬ 
troduce  some  necessary  notation,  state  the  decision  version  of  this  problem,  and 
demonstrate  that  it  is  NP-complete.  We  also  define  a  natural  greedy  rule  G  whose 
analysis  makes  up  the  main  thrust  of  this  study.  Section  3  contains  our  proof  that, 
for  the  general  case,  G’s  solution  always  exceeds  half  the  optimum.  We  devise  a 
powerful,  easy-to-apply  chain  lemma  to  aid  in  the  analysis  and  show,  through  a 
family  of  problem  instances,  that  this  bound  is  asymptotically  tight.  Furthermore, 
we  discuss  a  number  of  sharing  alternatives  and  demonstrate  that  (in  the  worst-case 
sense)  these  appealing  but  more  complicated  rules  do  no  better  than  G.  In  Section 
4,  we  consider  G’s  behavior  in  more  restricted  settings.  For  some  (most  notably, 
when  all  demands  exceeding  C  are  equal  and  all  demands  less  than  C  are  equal),  we 
prove  that  G  is  optimal.  We  address  research  directions  and  related  issues  in  the 
closing  section,  showing  that  this  problem  is  so  difficult  that  when  it  is  generalized 
slightly  by  allowing  users  to  possess  unequal  resource  capacities,  the  problem  of 
determining  whether  any  feasible  solution  exists  is  NP-complete. 


2.  Notation  and  preliminaries 


Without  loss  of  generality,  we  assume  a  scaling  so  that  the  common  resource 
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capacity  C  is  a  positive  integer,  and  so  that  the  elements  of  the  initial  demand  list 
Z)(0)  =  (df\df\  ...,<^0))  are  nonnegative  integers,  where  the  quantity  df\  1  </<«, 
denotes  the  initial  demand  of  user  i.  For  simplicity,  we  assume  a  user  indexing  whereby 
the  initial  demand  list  is  sorted  so  that  ***>^0).  We  dynamically  alter 

the  demand  list  as  we  implement  limited  sharing,  requiring  that,  at  any  time  t,  D{t) 
denotes  the  current  demand  list,  with  d\l)  representing  the  current  demand  of  user  /. 

Let  A w  denote  the  list  made  up  of  the  elements  of  Z>(r)  that  are  greater  than  or 
equal  to  C.  Let  |v4w|  denote  the  number  of  elements  in  Similarly,  let  B{t) 
denote  D^-A^  (that  is,  B{t)  contains  every  element  of  D(/)  that  is  less  than  C), 
where  |5(/)|  denotes  the  number  of  elements  in  B(t).  Thus  A{0)  (Bi0))  is  a  prefix 
(suffix)  of  D(0). 

To  define  limited  sharing  formally,  we  now  describe  how  (and,  hence,  A (/) 
and  B{t))  may  be  altered  in  an  effort  to  satisfy  all  demands.  As  long  as  |£w|>0, 
we  denote  a  sharing  action  by  the  ordered  pair  of  indices  (a,  b),  where  d^>C  and 
d^<C.  For  notational  convenience,  we  associate  one  sharing  action  with  the 
passage  of  one  time  unit.  The  effect  of  a  sharing  action  at  time  t+  1  is  to  transform 
D(0  into  D(t+l)  by  replacing  d^  with  d^t+l)  =  d^ +  C  and  replacing  d(bt}  with 
d(bt+l)  =  C.  Thus  a  sharing  action  directs  user  b  to  allocate  all  of  its  excess  capacity 
to  user  a. 

Note  that  a  sharing  action  is  possible  whenever  |2?(/)|>0.  Moverover,  such  an 
action  (a,b)  denies  user  b  the  opportunity  to  participate  in  a  subsequent  action. 
Therefore,  it  is  elementary  that  a  series  of  at  most  n  -  1  sharing  actions  produces 
a  feasible  solution,  forcing  every  demand  to  converge  on  C.  We  term  such  a  series 
a  sharing  sequence . 

For  a  sharing  sequence  S  with  k<n  sharing  actions  (which  by  definition  produces 
D<k)  such  that  d[k)  =  di2k)  =  -  =  dtnk)  =  C),  we  define  the  unshared  capacity  of  user  i 
to  be  the  minimum  value  in  the  set  {d\{)\  0  <t<k}.  Notice  that  this  value  is  exactly 
C  if  and  only  if  the  demand  of  user  i  was  represented  in  A w  at  every  time  /,  0  <  t  <  k. 
Given  D{0)  and  an  applicable  sharing  sequence  S,  we  define  the  total  unshared 
capacity  in  the  obvious  way,  as  the  sum  of  every  user’s  unshared  capacity. 

In  the  sequel,  we  shall  restrict  our  attention  to  solution  strategies  that  operate  by 
producing  a  sharing  sequence.  That  is,  such  a  strategy  must  direct  that  (1)  user  b 
can  lend  only  when  its  demand  is  less  than  C  and  (2)  user  a  can  borrow  only  when 
its  demand  is  greater  than  C.  Condition  (1)  is  merely  a  convenience,  since  a  user  can 
lend  only  once.  If  we  first  compute  the  amount  to  be  lent  and  to  whom  it  goes,  the 
time  of  the  relevant  sharing  action  is  immaterial.  Condition  (2),  on  the  other  hand, 
is  significant  since  a  user  can  borrow  repeatedly,  and  our  restriction  is  justified  as 
follows. 

Theorem  2.1.  An  optimal  solution  cannot  direct  that  excess  capacity  be  allocated 
to  a  user  whose  current  demand  is  less  than  or  equal  to  C. 

Proof.  Suppose  otherwise  for  an  optimal  solution  that,  at  some  time  t ,  directs  user 
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j  with  djl~l)<C  to  allocate  its  excess  to  user  i  with  df~  !)<  C.  Without  loss  of 
generality,  suppose  the  optimization  rule  next  directs  user  /  to  allocate  its  excess  to 
some  user  h .  Then  d%+l)  =  dft)  +  dft)-  C  =  d(p  +  d\t ~l) +  df~l) -2C,  since  the  un¬ 
shared  capacity  of  user  i  is  dfx  =  d\f~^  +  dj~ -  C,  and  that  of  user  j  is  simply 
dj(~l\  In  this  situation,  we  can  modify  the  solution  by  directing  usery  to  allocate 
its  excess  directly  to  user  h  at  time  /,  then  next  directing  user  i  to  allocate  its  excess 
to  user  h  as  well.  This  modification  preserves  d%+ !)  =  d^  +  df~X)  +  dj~  l)-2C.  The 
unshared  capacity  of  user  y  is  still  d^~x\  But  now  the  unshared  capacity  of  user  i 
is  d\t~X)>d\t~X)  +  dijt-l) -C,  contradicting  the  presumption  that  the  original  solu¬ 
tion  maximized  the  unshared  capacity.  □ 

We  now  address  the  complexity  of  limited  sharing  with  the  following  decision  ver¬ 
sion  of  the  problem. 

LIMITED  SHARING  PROBLEM  (LSP) 

Input.  A  positive  integer  Q  and  a  list  L  of  n  nonnegative  integers. 

Question.  Is  there  a  sharing  sequence,  using  D(0)  =  L  sorted  in  nonincreasing 
order  and  using  C-  £?=1  df^/n,  for  which  the  total  unshared  capacity  is  at  least  Q? 

This  problem  has  already  been  shown  to  be  NP-complete  [9].  We  shall  present 
our  simple  proof  of  its  complexity  here  for  the  purpose  of  illustration  and  because 
it  will  be  referenced  and  built  upon  in  proving  subsequent  results. 

Theorem  2.2.  LSP  is  NP -complete. 

Proof.  LSP  is  clearly  in  NP,  since  a  candidate  solution  can  be  easily  checked  in  poly¬ 
nomial  time.  To  show  that  LSP  is  NP-hard,  transform  any  instance  of  PARTITION 
[4]  into  an  instance  of  LSP  as  follows.  Let  the  list  of  integers  input  to  PARTITION 
be  denoted  by  p  =  (PuP2>  where  pl>p2>--->pm-  Let  5  denote  E^iA- 

Set  n-m  +  2,  set  Q~npx-s ,  and  set  L  =  (/j,/2, where  ll  =  l1=px+s/2  and 
li+2=Pi-Pm-i+i  f°r  1  ^i^m.  The  answer  to  this  instance  of  LSP  is  “yes”  if  and 
only  if  the  answer  to  the  given  instance  of  PARTITION  is  “yes”.  □ 

Thus  we  conclude  that  finding  a  sharing  sequence  that  maximizes  the  total  un¬ 
shared  capacity  is  NP-hard.  Hence  we  turn  our  attention  to  the  design  and  analysis 
of  fast  approximation  algorithms.  For  a  heuristic  algorithm  ALG  and  an  optimiza¬ 
tion  algorithm  OPT,  we  use  ALG (I)  and  OPT (/)  to  denote  their  respective  total  un¬ 
shared  capacities  for  LSP  instance  I.  We  seek  to  establish  a  worst-case  ratio  RA LG, 
defined  as  follows: 


fOPT  (/) 

Ralg  =  SUP  .  T  :  /  is  an  instance  of  LSP 
i  ALG(7) 
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If  we  can  prove  that  RA LG  is  bounded  above  by  some  constant,  then  we  say  that 
ALG  is  a  relative  approximation  algorithm. 

For  notational  convenience,  we  shall  also  employ  the  term  ALG^/)  to  denote 
the  contribution  to  ALG(/)  made  by  the  users  whose  demands  were  represented  in 
for  instance  7.  We  define  ALG5(7)  analogously  for  B®\  so  that  ALG(7)  = 
ALG^(7)  +  ALG5(7). 

At  every  time  t,  our  greedy  rule  G  (dubbed  the  “Robin  Hood’  ’  rule  in  [10]),  simply 
selects  a  so  as  to  maximize  d®  and  b  so  as  to  minimize  That  is,  it  iteratively 
directs  that  user  b ,  currently  with  the  largest  excess  capacity,  lend  it  all  to  user  a, 
currently  with  the  greatest  demand.  Ties  are  broken  in  favor  of  the  user  with  the 
lower  index.  The  time  complexity  of  G  is  O (n  log  «),  since  O (n  log  ri)  time  is  suffi¬ 
cient  for  the  initial  sorting  of  the  n  demands,  and  since  the  values  for  a  and  b  during 
each  of  G’s  at  most  n—  1  sharing  actions  can  be  determined  in  0(log  n)  time  with 
the  use  of  a  max-heap  for  A  and  a  min-heap  for  B. 


3.  The  general  case 

The  main  goal  of  this  section  is  to  establish  the  exact  value  of  RG.  We  first 
demonstrate  that  RG  exceeds  any  real  number  strictly  less  than  2. 


Example  3.1.  A  troublesome  instance  Ik  for  G. 

Let  k  denote  any  integer  exceeding  1  and  let  n  =  3k-l. 
Let  7)(0)  be  defined  as  follows: 


d\0)  =  k2  +  k  +  1, 

df)  =  2k  +  2i  for  1  </<£, 

dj0)=  1,  for  k<i<2k, 

<7<0)  =  0,  for  2k<i<3k-  1. 


Thus  C=k+  1,  and 

^OPT(7,)_  k(k  +  2)  _2 _ 6_ 

G_  G(Ik)  k(k  +  5)/2  k  +  5’ 


which  approaches  arbitrarily  close  to  2  from  below  as  k  grows  without  bound. 


In  Example  3.1,  observe  that  G’s  first  A:  —  1  sharing  actions  distribute  most  of  d[0) 
across  the  last  k—  1  users,  and  so  d[k~l)  =  k2  +  k+  1  -(k-  l)(/:+  l)  =  /:  +  2.  G’s  next 
k- 1  sharing  actions  produce<7/2*-2)  =  A:  +  2,  for  1  </</:,  and  7?(2*_2)  =  (<7^_2)  =  1). 
Therefore  GA(Ik)  =  i=k(k+  3)/2.  Since  GB(Ik)  =  k ,  we  conclude  that  G(Ik)  = 
k(k+3)/2  +  k  =  k(k  +  5)/2.  A  better  solution  can  be  devised  by  first  directing  that 
the  unsatisfied  demand  of  user  i  be  allocated  the  excess  capacity  of  user  j  -  i  +  2k  - 1 , 
for  !</</:,  from  which  it  follows  that  OPT(7*)  =  /:(£+  1)  +  k  =  k(k  +  2). 
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Thus  we  know  that  2  is  a  lower  bound  on  RG.  We  now  proceed  to  prove  that  2 
is  an  upper  bound  as  well. 

Lemma  3.2.  If  ALGa(I)>\A(0)\C/2,  then  OPT(/)  <  2 ALG(/). 

Proof.  Clearly,  OPT^/)^  |^4(0)|C.  Any  algorithm,  ALG,  that  produces  a  sharing 
sequence  in  an  attempt  to  solve  LSP  insures  that  ALGfi(7)  is  precisely  the  total 
demand  in  B®\  and  hence  ALGfi(7)  =  OPT5(7).  Thus  it  follows  that,  if  AL GA(I)> 
\Ai0)\ C/2,  then  OPT(7)  -  OPTA(I)  +  OPT*  (7) <  \A(0)\ C+ ALGB(7) <  2ALG^(7)  + 
ALG5(7)  <  2  ALG(7).  □ 

In  order  to  state  and  prove  the  next  lemma,  it  will  be  helpful  first  to  define  a  chain 
of  users,  each  of  whose  demands  were  represented  in  A®\  with  respect  to  a  sharing 
sequence  S.  We  denote  such  a  chain  of  length  k  by  the  ordered  list  of  indices 
U=(uuu2,...,uk)i  where  1  <  w/2<  |,4(0)|  for  1  <h<k.  At  some  time  4>0,  the  first 
member  of  such  a  chain,  uu  must  occur  in  a  sharing  action  of  S  of  the  form 
{a,  =  uu  b i),  where  d%>  is  in  7?(0)  and  d™  <  C.  (Thus  dj®  =  ~ 1) <  C,  and  d™  =  C.) 

If,  at  some  time  t2>tXi  u{  occurs  in  a  sharing  action  of  the  form  (< a2 ,  b2  =  ul)  where 
d£z)  <  C,  then  u2  =  a2  is  the  second  user  in  the  chain,  and  so  on.  The  chain  ends  with 
uk  when,  at  time  4  +  1,  uk  occurs  in  a  sharing  action  of  the  form  ( ak+u  bkAl  =  uk), 
where  d^k+l)>C. 

Lemma  3.3  (the  chain  lemma).  If  77  is  an  LSP  heuristic  that  produces  a  sharing  se¬ 
quence  in  which ,  at  every  given  time  t,  a  is  selected  so  as  to  maximize  dff\  then  the 
contribution  to  HA(I)  made  by  the  users  in  a  user  chain  of  length  k  is  at  least  kC/2. 

Proof.  Let  77  satisfy  the  statement  of  the  lemma.  Let  7  denote  an  instance  of  LSP  with 
chain  U=(ux,  u2i  ...,uk)  for  some  k> 0.  Since  77  produces  a  sharing  sequence,  the 
contribution  to  HA(I)  made  by  the  users  in  U  is  £j=1  dff*'*.  By  77’s  choice  of  a  for 
each  sharing  action,  d^~l)>dl[hfl)  for  1  <h<k.  Therefore,  dff^dff^-d^K  Also, 
d{th) - d{th " 1  > > d{th + 1) - d^th)  for  1  <h<k,  and  dj\k)-djfk ~l)>C-dj\k).  From  this  set  of 

k  inequalities  we  obtain  dfj^  >  C/{k  +1),  else  we  derive  C = d™  +  (d^f  -d^f)-\ - V 

{C-d^k))<(k+  \)C/(k+  1)  =  C,  which  is  impossible.  Similarly,  we  obtain  d^> 
hC/(k+\)  for  1  <h<k.  Hence  we  conclude  that  E^=1  d^>  £j=1  hC/(k+  1)  = 
(k(k  +  \)/2)C/(k  +  1)  =  kC/2.  □ 

We  now  establish  the  general  utility  of  the  chain  lemma. 

Theorem  3.4.  If  H  is  an  LSP  heuristic  that  produces  a  sharing  sequence  in  which , 
at  every  time  t,  a  is  selected  so  as  to  maximize  djf\  then  RH< 2. 

Proof.  Given  such  an  77  and  any  instance  7  of  LSP,  no  user  can  be  represented  in 
more  than  one  user  chain.  Moreover,  there  must  always  exist  at  least  one  user  from 
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A (0)  that  is  not  represented  in  any  chain.  We  apply  the  chain  lemma  to  every  chain 
in  /,  guaranteeing  that  HA{I)>{ M(0)|  -  l)C/2  +  C>  |>4(0)|C/2,  from  which  Lemma 
3.2  tells  us  that  OPT(/)/>4(/)<2.  Therefore,  by  the  definition  of  worst-case  ratio, 
Rh<  2.  □ 

Corollary  3.5.  RG  =  2. 

Proof.  Follows  immediately  from  Example  3.1  and  Theorem  3.4.  □ 

Thus  G  is  a  relative  approximation  algorithm.  That  LSP  permits  such  an  algo¬ 
rithm  is  interesting,  since  some  NP-hard  problems  do  not  (see,  for  example,  the  non- 
Euclidean  traveling  salesman  problem  [8]  or  the  weighted  depth-first  spanning  tree 
problem  [1]). 

Are  there  relative  approximation  algorithms  with  lower  worst-case  ratios?  A 
second  alternative  algorithm,  which  we  denote  by  >42,  selects  a  as  does  G  so  as  to 
maximize  but  selects  b  so  as  to  maximize  as  well.  Thus  >42  is  nearly  op¬ 
timal  for  Example  3.1.  Unfortunately,  A2  asymptotically  fares  no  better  than  G  in 
the  worst  case. 

Example  3.6.  A  troublesome  instance  Ik  for  >42. 

Let  k  denote  any  integer  exceeding  1  and  let  n  =  k2  +  2k  +  2. 

Let  Z>(0)  be  defined  as  follows: 

d/0) -k2  +  2k+\,  for  1  <  /  <  &  + 1 , 

dj0)=  1,  for  £+l</<2£  +  2, 

dj0)  =  0,  for  2k  +  2<i<k2  +  2k  +  2. 

Thus  C=k+  1,  and 

R  ^OPT (4)_  (*+!)(* +  2)  _2  4 

>42(4)  (*+l)(*  +  4)/2  k  +  4  * 

which  approaches  arbitrarily  close  to  2  from  below  as  k  grows  without  bound. 

In  Example  3.6,  observe  that  an  optimal  sharing  sequence  can  be  devised  by  first 
directing  that  the  unsatisfied  demand  of  user  1  be  allocated  the  excess  capacity  of 
the  k- l-l  users  with  indices  in  the  range  [k  +  2,2k  +  2]. 

Corollary  3.7.  RA 2  =  2. 

Proof.  Follows  immediately  from  Example  3.6  and  Theorem  3.4.  □ 

A  third  alternative  >43  selects  a  so  as  to  minimize  d(f)>C  and  b  so  as  to  minimize 
d^  (in  which  case  the  chain  lemma  does  not  apply).  Thus  ,43  is  optimal  for  both 


142 


M.A.  Langston ,  M.P.  Morford 


Examples  3.1  and  3.6.  However,  A 3  fails  to  certify  as  even  a  relative  approximation 
algorithm. 

Example  3.8.  A  troublesome  instance  Ik  for  ,43. 

Let  k  denote  any  integer  exceeding  1  and  let  n  =  2k  +  1. 

Let  Z>(0)  be  defined  as  follows: 

d^  =  k\ 

dj0)  =  k+  1,  for  \  <i<k+  1, 
df0)  =  0,  for  k+  1  <i<2k  +  1. 

Thus  C=£,  and 

OPT(4)  k(k+3)/2  ^  k 
Ra3  ~  73(4)  ~  Lk  >  4  ’ 
which  is  unbounded  above. 

A  fourth  alternative  A4  selects  a  so  as  to  minimize  and  b  so  as  to  maximize 
d^\  Like  >13,  A 4  is  not  a  relative  approximation  algorithm. 

Example  3.9.  A  troublesome  instance  Ik  for  A4. 

Let  k  denote  any  integer  exceeding  1  and  let  n  =  k  +  2. 

Let  Z>(0)  be  defined  as  follows: 

d\0)  =  2k2-ki 

d{0)  =  k2  + 1,  for  1  <i<k+  1, 

<4  +  2  =  0- 

Thus  C  =  k 2,  and 

^  OPT(4)  k(k+  l)(k-  1/2)  ^  k 
Ra4~  A4(Ik)  ~  k(3k+  l)/2  >  2  ’ 

which  is  unbounded  above. 

What  about  a  “best  fit”  strategy  of  some  sort?  For  example,  such  an  alternative 
A 5  could  initially  choose  a  and  b  as  G  would,  but  then,  if  d^  -  C>C-d^\  reselect 
a  so  that  d^  is  a  least  demand  satisfying  d^  ~  C>C- d%\  But  this  scheme  is 
asymptotically  no  better  than  G .  To  see  this,  simply  modify  Example  3.1  by  setting 
dj0)~2k+  1  for  1  </<£and  df>)  =  2  for  k<i<2k-  1,  in  which  case >15  reduces  to  G. 

Corollary  3.10.  RA 5=2. 

Proof.  Follows  immediately  from  the  modification  to  Example  3.1  described  above 
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and  the  observation  that  the  chain  lemma  (and  hence  Theorem  3.4)  holds  as  long 
as  H  maximizes  whenever  it  must  start  a  new  chain  or  append  to  an  existing 
chain,  at  which  time  d */} -  C<C-d^\  □ 

Finally,  what  of  ‘  ‘compound”  algorithms  (those  that  implement  multiple  heuristics 
and  select  the  best  solution  produced)?  Although  this  approach  may  work  well  in 
practice,  it  might  not  yield  improved  worst-case  behavior  or  its  analysis  might  be 
exceedingly  difficult  (proofs  of  improved  worst-case  behavior  for  compound  algo¬ 
rithms  are  very  rare  [3,6]).  To  illustrate,  consider  alternative  A6,  a  compound 
algorithm  running  both  G  and  A2. 

Corollary  3.11.  RA6  =  2. 

Proof.  Follows  immediately  from  Example  3.6  (although  the  lower  bound  thereby 
obtained  for  G  is  neither  as  simple  nor  as  fast-growing  as  the  one  exhibited  in 
Example  3.1)  and  either  Corollary  3.5  or  3.7.  □ 

As  of  this  writing,  we  know  of  no  polynomial-time  algorithm  ALG  with  RALG<2, 
and  suspect  that  guaranteeing  a  bound  strictly  less  than  2  may  be  NP-hard. 


4.  Special  cases  of  LSP 

In  this  section,  we  investigate  G’s  behavior  in  more  restricted  problem  settings.  We 
know  from  Example  3.1  that  G  may  not  guarantee  an  optimal  solution  when  n> 5. 

Theorem  4.1  [7].  If  I  denotes  any  instance  o/LSP  with  n<  4,  then  G(/)  =  OPT(/). 

Suppose  all  demands  greater  than  C  are  alike,  as  are  all  demands  less  than  C. 

Theorem  4.2.  If  I  denotes  any  instance  of  LSP  with  df ))  =/?>  C  or  df )  =  q<C  for 
every  ie[ltn],  then  G(/)  =  OPT(/). 

Proof.  Suppose  otherwise,  and  let  /  denote  a  counterexample  with  |/)(0)|  =n.  With¬ 
out  loss  of  generality,  we  assume  that  /  is  minimal.  That  is,  no  counterexample  exists 
with  fewer  than  n  demands.  Since,  by  assumption,  Z)(0)  is  sorted  in  nonincreasing 
sequence,  there  exists  a  unique  k,  where  1  <  k<  n>  such  that  df'*  =  d^  =  •  •  •  =  dj.0)  =  p 
and  dfli  =  dfl2  =  ---=df)  =  q. 

Let  us  now  scrutinize  an  optimal  sharing  sequence  5  to  discover  how  it  can  pro¬ 
vide  a  greater  total  unshared  capacity  than  does  G.  S  must  contain  exactly  n-k 
sharing  actions  of  the  form  (a,b),  with  b>k.  Since  the  capacity  shared  in  each  of 
these  actions  is  fixed  at  C-q,  we  can  for  simplicity  assume  that  these  n-k  actions 
occur  first  in  S.  This  implies  that  S  must  distribute  the  excess  capacity  in  B (0)  (which 
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is  ( n-k){C-q ))  in  a  manner  fundamentally  different  from  that  of  G ,  else  the  two 
sublists  A^n~k^ -B^  and  B^n~k)  constitute  a  counterexample  with  k<n  demands, 
violating  the  presumed  minimality  of  /.  Moreover,  some  demand  represented  in 
A{n~k)  must  exceed  C+{C~q)  =  2C-qi  since  otherwise,  to  be  different  from  G, 
5  would  have  to  leave  some  user  in  B{n~k)  with  an  unshared  capacity  less  than 
C-(C~q)  =  q ,  which  is  impossible  for  a  sharing  sequence. 

We  shall  now  prove  that  S  is  not  optimal,  thereby  deriving  a  contradiction.  Let 
g  denote  the  index  of  an  arbitrary  user  with  dpk^>2C— q.  From  the  set  of  sharing 
actions  (after  time  n  —  k)  of  the  form  (g,  b  <  k ),  select  t  so  as  to  maximize  !)  -  dj?\ 

Define  h  as  the  lender  at  time  t.  That  is,  the  action  at  time  t  is  (g,h). 

We  insist  that  ( g,h )  be  the  last  sharing  action  with  g  as  the  borrower.  If  this  is 
not  the  case,  we  modify  S  at  no  cost  as  follows.  Define  //  as  the  time  of  the  last  ac¬ 
tion  of  the  form  (g,  b).  We  delay  the  action  (g,  h)  until  time  tl9  and  advance  by  one 
time  unit  the  subsequence  of  tt-t  actions  originally  scheduled  to  begin  at  time 
r  +  1.  After  this,  we  reset  t  to  th  By  our  choice  of  h ,  this  modification  produces  a 
sharing  sequence,  with  no  loss  in  the  total  unshared  capacity. 

Recalling  the  definition  of  a  user  chain  as  presented  in  Section  3,  we  observe  that 
S  contains  a  chain  U=(uuu2, ... ,  uk)  such  that  for  some  /,  1  </</:,  U;  =  h  and  thus 
ti+ 1  =  t.  If  /  =  k ,  then  we  know  that  d ^  =  C.  If  i<k,  then  we  know  that  ui+  {  =g  and 
Let  t'>n  ~  k>  tx  denote  the  earliest  time  at  which  dp<2C-q.  For  future 
reference,  let  x=C-dpl)>0  and  _y  =  C-^0>0.  Note  that  x>y. 

We  now  modify  S  as  follows.  First,  we  change  the  sharing  action  at  time  tx  from 
(Wj,  b>k)  to  (g,b).  Next,  we  change  every  sharing  action  of  the  form  (g,  b<k)  that 
occurs  after  time  V  but  before  time  t  to  ( ux,b ).  We  delete  the  action  (g,h)  at  time 
t .  We  insert  in  its  place  the  action  (u{jg)9  but  only  if  is  now  less  than  C. 

Finally,  we  check  to  see  if  there  is  a  subsequent  action  of  the  form  (/  g)  and,  if 
so,  delete  it,  inserting  in  its  place  the  action  (/,/z).  In  the  new  sequence,  let  z- 
C-d{p> 0.  Note  that,  by  our  choice  of  h  and  *>z. 

Consider  the  effect  of  this  modification.  To  the  unshared  capacity  of  user  g,  we 
have  added  y-z ,  which  may  reflect  an  increase  or  a  decrease.  To  the  unshared 
capacity  of  ul  (and  hence  also  to  the  unshared  capacity  of  each  user  ujt  1  <y</‘),  we 
have  added  x-y9  which  is  an  increase,  and  which  is  small  enough  not  to  halt  U 
prematurely.  The  unshared  capacity  of  all  other  users,  including  any  that  follow  g 
in  G,  is  unaltered.  Therefore,  the  total  unshared  capacity  is  increased  by  (y  —  z)  + 
/(JC_^)=x_z  +  (/-  l)(x-j)>0.  (Furthermore,  the  actions  at  user  ux  may  not  now 
even  constitute  a  sharing  sequence,  in  which  case  a  further  increase  may  be  ob¬ 
tainable.)  We  conclude  that  5  as  originally  given  was  not  optimal,  a  contradiction. 

□ 

Suppose  only  the  demands  greater  than  (less  than)  C  are  equal.  Let  LSP^  (LSP5) 
denote  all  problem  instances  under  this  restiction. 

Corollary  4.3.  LSP^  is  NP -complete. 
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Proof.  Follows  immediately  from  the  construction  used  in  the  proof  of  Theorem 

2.2.  □ 

Corollary  4.4.  When  restricted  to  instances  of  LSP^,  RG-  2. 

Proof.  Follows  immediately  from  Theorem  3.4  and  from  observing  G’s  behavior 
when  applied  to  the  family  of  instances  defined  in  Example  3.6.  □ 

Theorem  4.5.  LSP#  is  NP -complete. 

Proof.  Modify  the  proof  of  Theorem  2.2  as  follows.  Set  n  =  2m  +  2.  Set  Q  =  2  ms  4-  35. 
Set  =  4  =  55/ 2.  Set  /t+2  =  45-p,  for  1  <i<m  and  /,  =  0  for  m  +  2<i<2m  +  2.  (Thus 
C=25,  and  all  demands  less  than  C  are  equal.  By  this  choice  of  Q ,  there  can  be  no 
sharing  action  of  the  form  (a,b)  where  a=  1  or  2  and  b>m  +  2,  so  that  the  first  m 
sharing  actions  must  absorb  all  the  excess  capacity  in  B^\)  The  answer  to  this  in¬ 
stance  of  LSP5  is  “yes”  if  and  only  if  the  answer  to  the  given  instance  of  PARTI¬ 
TION  is  “yes”.  □ 

Although  Rg  for  LSP5  cannot  exceed  2  (by  Theorem  3.4),  its  exact  value  is  an 
open  issue  as  of  this  writing.  We  conjecture  that  it  is  strictly  less  than  2.  To  justify 
this  sentiment,  we  observe  that  our  worst  examples  for  G  depend  on  G  at  some  point 
driving  a  demand  in  A  nearly  to  zero,  while  OPT  is  able  to  avoid  doing  the  same. 
However,  as  the  construction  employed  in  the  proof  of  Theorem  4.5  suggests,  if  an 
instance  of  LSP#  is  difficult  for  G,  then  C  is  so  large  that  no  demand  in  A  can  be 
driven  nearly  to  zero  by  G  but  not  by  OPT. 

Finally,  as  an  interesting  problem  generalization,  suppose  that  every  user’s  demand 
must  be  greater  than  or  equal  to  some  minimum  threshold  value  re[0,  C].  This 
models  a  system  in  which  each  resource  has  some  fixed  capacity  r<C  that  is 
dedicated  to  the  primary  user.  (For  example,  in  a  distributed  computing  realization, 
a  user’s  operating  system  kernel  must  remain  resident  in  his  local  memory.)  Let 
LSPr  denote  all  instances  of  this  problem. 

Corollary  4.6.  LSPT  is  NP -complete. 

Proof.  Follows  immediately  from  Theorem  2.2  (LSPT  includes  LSP0  as  a  special 
case).  □ 

Theorem  4.7.  For  LSPT,  (2C+4t)/(C+5t)<jRg<2C/(C  +  t).  Moreover ,  these 
bounds  are  tight  when  t  =  0  or  t  =  C. 

Proof.  Suppose  re  (0,  C).  To  demonstrate  the  lower  bound,  modify  Example  3.1  as 
follows.  Add  r  to  every  demand,  and  hence  to  C  as  well.  Thus  C=z+k+l  gives 
both  C  and  z  in  terms  of  k.  (For  instance,  if  r  =  C/2,  then  C  =  2k  +  2  and  z  =  k+  1.) 


146 


M.A.  Langston ,  M.P.  Morford 


Therefore  OPT(4)  =  OPT^(4)  +  OPT5(4)  =  ^C+((2/:-  \)t  + k)  =  kC+2kx  + k-r 
and  G(4)  =  GA(Ik)  +  GB(Ik)  =  /ct+  k(C  -  r)/2  +  k+  ((2  k  -l  )r  +  k)  =  k(C+  r)/2  +  # 

2kr  +  2k-r.  So  i?G>(2C+4r+(2-2r/A:))/(C+5T+(4-2T/A:)),  which  approaches 
arbitrarily  close  to  (2C  +  4r)/(C+  5r)  from  below  as  k  grows  without  bound  (since, 
for  any  fixed  r,  r/k  is  bounded  above  by  a  constant). 

To  prove  the  upper  bound,  let  /  denote  any  instance  of  LSPr.  Clearly, 
OPT4(/)<  |.4(0)|C.  From  the  proof  of  the  chain  lemma  and  its  application  in 
Theorem  3.4,  we  derive  GA(I)>  |,4(0)|(t  +  (C- t)/2)=  |^4(0)|(C  +  t)/2.  Therefore, 
since  OVTB(I)~GB(I ),  we  have 

OPT(/)  __  |y4(0)|C+OPTfif/)  __  \Ai(i)\C  2 C 

G( T)  <  | y4(0)|(C+r)/2  +  GB(/)  “  \A(0)\(C+t)/2  ~  C+^  ‘ 

If  r  =  0,  then  the  range  of  values  specified  in  Theorem  4.7  collapses  to  2,  which 
is  confirmed  by  Corollary  3.5.  If  r  =  C,  then  the  range  of  values  collapses  to  1 ,  which 
is  confirmed  by  observing  that  no  sharing  exists  in  this  extreme  case.  □ 


5.  Directions  for  future  research 


Several  interesting  questions  remain  unanswered.  Is  there  any  polynomial-time 
algorithm  ALG  with  RAlg<rg  for  LSP?  What  is  the  exact  value  of  RG  for  LSP5? 
Can  our  bounds  on  Rc  for  LSPT  be  tightened  when  the  threshold  r  is  strictly 
greater  than  zero  and  less  than  Cl  (For  example,  Theorem  4.7  only  guarantees  that 
8/7</?g<4/3  when  r=CA 2.)  And  what  of  other  special  cases,  such  as  when 
|y4(0)|  >  |£(0)|  or  when  there  is  an  upper  bound  on  any  demand  (say,  for  example, 
2C)? 

The  study  of  limited  sharing  can  be  expanded  in  a  number  of  ways.  One  might 
permit  k  users  to  share  another’s  excess  for  some  fixed  k>  1.  The  requirement  that 
total  system  demand  saturates  total  resource  capacity  could  be  eased  (although  this 
may  no  longer  model  resource  balancing,  an  interpretation  mentioned  in  Section  1). 
Lest  the  reader  be  left  with  any  suspicion  that  LSP,  as  defined  here,  is  overly  con¬ 
strained,  we  close  with  the  following  result.  Let  us  remove  the  restriction  that  every 
user  must  possess  the  same  resource  capacity.  Let  Q  denote  the  (now  arbitrary) 
capacity  of  user  i  and  let  LSPC  denote  this  relaxed  version  of  LSP. 

Theorem  5.1.  It  is  NP -complete  to  determine  whether  an  arbitrary  instance  of 
LSPC  has  any  feasible  solution. 

Proof.  Modify  the  proof  of  Theorem  2.2  (in  which  Q  is  now  insignificant),  by  set¬ 
ting  li  =  I2  =  s/2  with  C\  =  C2  =  0,  and  setting  li+2  =  0  with  Ci+2=Pi  for  1  <i<m. 
This  instance  of  LSPC  has  a  feasible  solution  if  and  only  if  the  answer  to  the  given 
instance  of  PARTITION  is  “yes”.  □ 


Resource  allocation 


147 


Acknowledgement 

We  wish  to  thank  the  two  anonymous  referees,  whose  careful  review  of  our 
original  submission  helped  to  streamline  and  clarify  the  presentation  of  these  results. 


References 


[1]  M.R.  Fellows,  D.K.  Friesen  and  M.A.  Langston,  On  finding  optimal  and  near-optimal  lineal 
spanning  trees,  Algorithmica  3  (1988)  549-560. 

[2]  D.K.  Friesen  and  M.A.  Langston,  Variable  sized  bin  packing,  SIAM  J.  Comput.  15  (1986)  222-230. 

[3]  D.K.  Friesen  and  M.A.  Langston,  Analysis  of  a  compound  bin  packing  algorithm  (to  appear). 

[4]  M.R.  Garey  and  D.S.  Johnson,  Computers  and  Intractability:  A  Guide  to  the  Theory  of  NP- 
Completeness  (Freeman,  New  York,  1979). 

[5]  D.E.  Knuth,  The  Art  of  Computer  Programming,  Vol.  2:  Seminumerical  Algorithms  (Addison- 
Wesley,  Reading,  MA,  1969). 

[6]  M.A.  Langston,  Interstage  transportation  planning  in  the  deterministic  flow-shop  environment, 
Oper.  Res.  35  (1987)  556-564. 

[7]  M.A.  Langston  and  M.P.  Morford,  Resource  allocation  under  limited  sharing,  Computer  Science 
Tech.  Rept.  CS-87-164,  Washington  State  University,  Pullman,  WA  (1987). 

[8]  C.H.  Papadimitriou  and  K.  Steiglitz,  Combinatorial  Optimization:  Algorithms  and  Complexity 
(Prentice-Hall,  Englewood  Cliffs,  NJ,  1982). 

[9]  W.W.  Tsang,  Analysis  of  the  square-the-histogram  method  for  generating  discrete  random  variables, 
M.S.  Thesis,  Department  of  Computer  Science,  Washington  State  University,  Pullman,  WA  (1980). 

[10]  W.W.  Tsang  and  G.  Marsaglia,  A  decision  tree  algorithm  for  squaring  the  histogram  in  random 
number  generation,  in:  Proceedings  Australia-Singapore  Joint  Conference  on  Information  Process¬ 
ing  and  Combinatorial  Mathematics,  Singapore  (1986)  325-336. 


Reprinted  from 


DISCRETE 

MATHEMATICS 


Discrete  Mathematics  182  (1998)  191-196 

On  algorithmic  applications  of  the  immersion  order1 

An  overview  of  ongoing  work  presented  at  the  Third  Slovenian 
International  Conference  on  Graph  Theory 

Michael  A.  Langston,  Barbara  C.  Plaut* 

Department  of  Computer  Science,  University  of  Tennessee,  Knoxville,  TN  37996-1301,  USA 
Received  18  September  1995;  received  in  revised  form  23  November  1996;  accepted  15  May  1997 


ELSEVIER 


DISCRETE  MATHEMATICS 


Editor-in-Chief  Peter  L.  Hammer,  Piscataway  (NJ) 


Advisory  Editors 

G  Berge,  Paris 

A.J.  Hoffman, 

Yorktown  Heights  (NY) 


V.L.  Klee,  Seattle  (WA) 

R.C  Mullin,  Waterloo 
G.-C  Rota,  Cambridge  (MA) 


V.T.  Sos,  Budapest 
J.H.  van  Lint,  Eindhoven 


Board  of  Editors 

M. S.  Aigner,  Berlin 

B.  Alspach,  Burnaby 

G.E.  Andrews,  Univ.  Park  (PA) 

A.  Barlotti,  Firenze 

C.  Benzaken,  Grenoble 
J.-C  Bermond, 

Sophia-Antipolis 

N. L.  Biggs,  London 

B.  Bollobas,  Memphis  (TN) 
R.A.  Brualdi,  Madison  (WI) 
T.H.  Brylawski, 

Chapel  Hill  (NC) 

P.J.  Cameron,  London 
P.  Camion,  Le  Chesnay 
G.  Chartrand,  Kalamazoo  (MI) 

Editorial  Manager  Nelly  Segal 


V.  Chvatal,  Piscataway  (NJ) 
D.  Foata,  Strasbourg 
A.S.  Fraenkel,  Rehovot 
P.  Frankl,  Tokyo 
A.M.  Frieze,  Pittsburgh  (PA) 

I. M.  Gessel,  Waltham  (MA) 
R.L.  Graham, 

Florham  Park  (NJ) 

A.  Hajnal,  Budapest 

F.  Harary,  Las  Cruces  (NM) 
D.M.  Jackson,  Waterloo 

J.  Kahn,  Piscataway  (NJ) 

G. O.H.  Katona,  Budapest 
D  J.  Kleitman, 

Cambridge  (MA) 

Issue  Manager  Mick  van  Gijlswijk 


A.V.  Kostochka,  Novosibirsk 
L.  Lovasz,  New  Haven  (CT) 
I:  Rival,  Ottawa 
A.  Rosa,  Hamilton 
S.  Rudeanu,  Bucharest 
H.  Sachs,  Ilmenau 
J.  Schonheim,  Tel-Aviv 
N.J.A.  Sloane 
Florham  Park  (NJ) 

C.  Thomassen,  Lyngby 
W.T.  Tutte,  Newmarket 

D. J.A.  Welsh,  Oxford 
R.  Wille,  Darmstadt 

D.R.  Woodall,  Nottingham 
H.P.  Yap,  Singapore 


Publication  Information.  Discrete  Mathematics  (ISSN  0012-365X).  For  1998  volumes  178-193  are 
scheduled  for  publication.  A  combined  subscription  to  Discrete  Mathematics  and  Discrete  Applied 
Mathematics  (Vols.  80-88)  at  reduced  rate  is  available.  Subscription  prices  are  available  upon 
request  from  the  Publisher.  Subscriptions  are  accepted  on  a  prepaid  basis  only  and  are  entered  on 
a  calendar  year  basis.  Issues  are  sent  by  surface  mail  except  to  the  following  countries  where  air 
delivery  via  SAL  is  ensured:  Argentina,  Australia,  Brazil,  Canada,  Hong  Kong,  India,  Israel, 
Japan,  Malaysia,  Mexico,  New  Zealand,  Pakistan,  China,  Singapore,  South  Africa,  South 
Korea,  Taiwan,  Thailand,  USA.  For  all  other  countries  airmail  rates  are  available  upon  request. 
Claims  for  missing  issues  must  be  made  within  six  months  of  our  publication  (mailing)  date. 
For  orders,  claims,  product  enquiries  (no  manuscript  enquiries)  please  contact  the  Customer  Support 
Department  at  the  Regional  Sales  Office  nearest  to  you: 

New  York,  Elsevier  Science,  P.O.  Box  945,  New  York,  NY  10159-0945,  USA.  Tel:  (+ 1)  212-633-3730, 
[Toll  Free  number  for  North  American  Customers:  1-888-4ES-INFO  (437-4636)],  Fax:  (+ 1)  212- 
633-3680,  E-mail:  usinfo-f@elsevier.com 

Amsterdam,  Elsevier  Science,  P.O.  Box  211, 1000  AE  Amsterdam,  Netherlands,  Tel:  (  +  31)  20-485- 
3757,  Fax:  (  +  31)  20-485-3432,  E-mail:  nlinfo-f@elsevier.nl 

Tokyo,  Elsevier  Science,  9-15,  Higashi-Azabu  1-chome,  Minato-ku,  Tokyo  106,  Japan.  Tel:  (  4-  81) 
3-5561-5033,  Fax:  (  +  81)  3-5561-5047,  E-mail:  info@elsevier.co.jp 

Singapore,  Elsevier  Science,  No.  1  Temasek  Avenue,  #  17-01  Millenia  Tower,  Singapore  039192. 
Tel:  (  +  65)  434-3727,  Fax:  (  +  65)  337-2230,  E-mail:  asiainfo@elsevier.com.sg 


©  1998,  Elsevier  Science  B.V.  (North-Holland) 

All  rights  reserved.  No  part  of  this  publication  may  be  reproduced,  stored  in  a  retrieval  system  or  transmitted  in  any  form  or  by  any  means, 
electronic,  mechanical,  photocopying,  recording  or  otherwise,  without  the  prior  permission  of  the  Publisher,  Elsevier  Science  B.V.,  Copyright 
and  Permissions  Department,  P.O.  Box  521,  1000  AM  Amsterdam,  Netherlands. 

Special  regulations  for  authors — Upon  acceptance  of  an  article  by  the  journal,  the  authoifs)  will  be  asked  to  transfer  copyright  of  the  article  to 
the  Publisher.  This  transfer  will  ensure  the  widest  possible  dissemination  of  information. 

Special  regulations  for  readers  in  the  USA— This  journal  has  been  registered  with  the  Copyright  Clearance  Center,  Inc.  Consent  is  given  for 
copying  of  articles  for  personal  or  internal  use,  or  for  the  personal  use  of  specific  clients.  This  consent  is  given  on  the  condition  that  the  copier 
pays  through  the  Center  the  per-copy  fee  stated  in  the  code  on  the  first  page  of  each  article  for  copying  beyond  that  permitted  by  Sections  107  or 
108  of  the  US  Copyright  Law.  The  appropriate  fee  should  be  forwarded  with  a  copy  of  the  first  page  of  the  article  to  the  Copyright  Clearance 
Center,  Inc.,  222  Rosewood  Drive,  Danvers,  MA  01923,  USA.  If  no  code  appears  in  an  article,  the  author  has  not  given  broad  consent  to  copy 
and  permission  to  copy  must  be  obtained  directly  from  the  author.  The  fee  indicated  on  the  first  page  of  an  article  in  this  issue  will  apply 
retroactively  to  all  articles  published  in  the  journal,  regardless  of  the  year  of  publication.  This  consent  does  not  extend  to  other  kinds  of  copying 
such  as  for  general  distribution,  resale,  advertising  and  promotion  purposes,  or  for  creating  new  collective  works.  Special  written  permission 
must  be  obtained  from  the  Publisher  for  such  copying. 

No  responsibility  is  assumed  by  the  Publisher  for  any  injury  and/or  damage  to  persons  or  property  as  a  matter  of  products  liability,  negligence 
or  otherwise,  or  from  any  use  or  operation  of  any  methods,  products,  instructions  or  ideas  contained  in  the  material  herein.  Although  all 
advertising  material  is  expected  to  conform  to  ethical  standards,  inclusion  in  this  publication  does  not  constitute  a  guarantee  or  endorsement  of 
the  quality  or  value  of  such  product  or  of  the  claims  made  of  it  by  its  manufacturer. 

©  The  paper  used  in  this  publication  meets  the  requirements  of  ANSI/NISO  Z39.48-1992  (Permanence 
of  Paper) 

Published  monthly  0012-365X/98/S19.00  Printed  in  the  Netherlands 


Discrete  Mathematics  182  (1998)  191-196 


DISCRETE 

MATHEMATICS 


ELSEVIER 


On  algorithmic  applications  of  the  immersion  order1 


An  overview  of  ongoing  work  presented  at  the  Third  Slovenian 
International  Conference  on  Graph  Theory 

Michael  A.  Langston,  Barbara  C.  Plaut* 

Department  of  Computer  Science,  University  of  Tennessee,  Knoxville,  TN  37996-1301,  USA 
Received  18  September  1995;  received  in  revised  form  23  November  1996;  accepted  15  May  1997 


Abstract 

A  snapshot  of  our  current  exploration  of  the  algorithmic  aspects  of  the  immersion  order  is 
presented.  Integrated  circuit  partitioning  is  used  as  a  prototypical  applications  domain.  Decision 
and  search  algorithms,  self-reductions,  closure-preserving  operators  and  related  developments  are 
discussed. 


1.  Background 

We  consider  only  finite,  undirected  graphs.  H  is  said  to  be  immersed  in  G,  written 
H^i  G,  iff  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  lifting  pairs  of  adjacent 
edges  and  taking  a  subgraph.  A  pair  of  adjacent  edges  uv,vw,  with  u^v^w  is  lifted 
by  removing  uv  and  vw  and  adding  uw.  As  an  example,  observe  that  C4  is  immersed 
in  K\  +  2K2  (Fig.  1). 

Suppose  a  family  F  is  closed  in  this  order,  that  is,  GeF  and  H  < i  G  =$■  H  e F.  The 
obstruction  set  for  F  consists  of  the  immersion-minimal  elements  in  F’s  complement. 
Accordingly,  F  has  the  following  characterization:  G  is  in  F  iff  no  obstruction  for  F  is 
immersed  in  G.  It  is  known  [15]  that  any  such  obstruction  set  is  finite.  It  is  also  known 
[7, 14]  that  deciding  whether  H  G  is  decidable  in  polynomial  time  for  every  fixed  H. 
Thus,  there  exists  a  polynomial-time  recognition  algorithm  for  any  immersion-closed 
family  of  graphs.  See  [7]  for  many  examples.  Such  an  algorithm  is  not  constructively 
known,  but  possesses  a  time  bound  of  O (nh+3 ),  where  h  denotes  the  order  of  the  largest 
obstruction. 
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Fig.  1.  04^1^1+2^2. 


One  of  the  earliest  and  best-known  applications  of  the  immersion  order  is  the  min  cut 
linear  arrangement  problem  [9].  Though  J^-complete  in  general,  the  fixed-parameter 
version  of  this  problem  has  been  shown  to  be  decidable  in  linear  time  with  the  aid  of 
the  immersion  order  and  special  tools  based  on  the  treewidth  metric  [7,2].  Much  less 
is  known,  however,  about  the  vast  majority  of  applications. 


2.  Circuit  partitioning 

Consider  the  field  programmable  gate  array  (henceforth  FPGA),  a  collection  of 
logic  blocks  with  programmable  connections  (see  [12]).  A  given  circuit  is  imple¬ 
mented  by  partitioning  its  logic  into  blocks  and  connecting  the  blocks  as  required 

(Fig.  2). 

Since  circuits  are  frequently  too  large  to  fit  on  a  single  chip,  they  must  be  par¬ 
titioned  over  several  FPGA’s.  In  building  systems  with  multiple  FPGA’s,  fabrication 
technology  imposes  severe  restrictions:  limits  on  pin  counts  (I/O  cells)  affect  inter-chip 
connectivity;  limits  on  chip  area  and  density  bound  FPGA  sizes. 

Such  practical  limitations  motivate  many  interesting  combinatorial  problems.  Con¬ 
sider,  for  example,  the  problem  we  herewith  call  the  Min  Degree  Graph  Partition 
problem.  In  this  problem,  we  are  given  a  graph  G  =  (V,E)  and  two  integers  k  and 
d,  and  are  asked  whether  V  can  be  partitioned  into  disjoint  subsets  V\,V2,...,Vm  so 
that,  for  \Vt\  and  at  most  d  edges  have  exactly  one  end-point  in  If.  In  a 

multi-FPGA  context,  for  example,  G  models  the  circuit  to  be  partitioned,  k  denotes  the 
maximum  number  of  logic  blocks  permitted  on  a  chip,  and  d  represents  the  maximum 
degree  or  pin  count  of  any  chip. 

This  problem  is  clearly  very  difficult,  in  fact  intractable  without  parameter  bounds, 
via  a  reduction  from  Multiway  Cut  or  Graph  Bisection: 

Theorem  1  (Govindan  [10]).  Min  Degree  Graph  Partition  is  -complete. 

Fortunately,  however,  the  aforementioned  fabrication  limits  can  be  used  to  advantage. 
As  long  as  k  and  d  are  bounded,  the  family  of  ‘yes’  instances  is  closed  in  the  immersion 
order. 

Theorem  2.  For  any  fixed  k  and  d,  Min  Degree  Graph  Partition  can  be  decided  in 
polynomial  time. 
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Interconnection  Resources 


■I/O  Cell 


Proof  (sketch).  It  is  straightforward  to  check  that  neither  taking  a  subgraph  nor  lifting 
pairs  of  edges  can  turn  a  ‘yes’  instance  into  a  ‘no’  instance.  Hence,  the  ‘yes’  family 
is  immersion  closed.  □ 

The  last  theorem  is  of  particular  interest  in  light  of  the  observation  that,  unlike 
Multiway  Cut,  Min  Degree  Graph  Partition  has  no  known  brute-force  polynomial-time 
algorithm  when  k  and  d  are  fixed.  This  is  in  contrast  to  the  superficially  similar  Graph 
Partition  problem,  in  which  the  cost  of  a  solution  is  summed  over  all  subsets  rather 
than  measured  over  each,  thus  bounding  the  maximum  number  of  partitions. 

Results  such  as  this  inherently  rely  on  the  existence  of  finite  lists  of  immersion- 
minimal  obstructions.  As  of  this  writing,  little  is  known  about  such  obstructions  in 
general  or  about  practical  immersion  tests  in  particular.  As  with  the  minor  order,  we 
expect  that  even  partial  sets  can  be  useful  [13].  It  has  been  observed  that  complete 
graphs  are  often  obstructions  to  immersion-closed  families.  Testing  for  K\ ,  K2  or 
is  easy.  Testing  for  K4  turns  out  to  be  quite  complicated,  however,  though  achievable 
in  linear  time.  See  [3]  for  decision,  search  and  parallel  algorithms. 

Min  Degree  Graph  Partition  is  an  excellent  example  of  the  current  state  of  the  art. 
We  have  identified  a  wide  array  of  other  problems,  largely  from  the  circuit  parti¬ 
tioning  domain,  amenable  to  tools  based  on  the  immersion  order.  For  most  of  these, 
just  as  with  Min  Degree  Graph  Partition,  we  can  at  present  say  not  much  more  than 
that  they  are  (nonconstructively)  decidable  in  polynomial  time.  Whether  they  are  solv¬ 
able  in  low-order  polynomial  time,  perhaps  even  linear  time,  is  an  open  question, 
and  one  we  are  actively  pursuing.  One  might  be  tempted  to  employ  the  treewidth 
metric,  useful  for  Min  Cut  Linear  Arrangement.  If  the  family  of  ‘yes’  instances  has 
bounded  treewidth,  linear  time  recognizability  is  assured.  But  that  is  not  generally 
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the  case.  To  see  this,  consider  Min  Degree  Graph  Partition  with  k  =  1  and  d  =  4. 
Even  this  simple  family  of  graphs  contains  the  w  x  w  grid  for  any  w,  a  graph  with 
treewidth  w.  One  might  also  ask  about  eliminating  nonconstructivity.  We  have  de¬ 
veloped  some  techniques  for  that  task,  although  they  are  mainly  of  theoretical  in¬ 
terest  and  beyond  the  scope  of  this  brief  review.  We  refer  the  reader  to  [8]  for 
details. 


) 

J 


3.  Search  algorithms  and  self-reducibility 

It  is  sometimes  possible  to  solve  a  search  problem  by  reducing  it  to  a  related  deci¬ 
sion  problem.  (See  [9]  for  a  detailed  discussion  of  search  versus  decision.)  For  exam¬ 
ple,  one  might  seek  to  find  a  satisfying  subset  assignment  for  Min  Degree  Graph 
Partition  with  the  aid  of  a  routine  that  merely  tells  whether  such  an  assignment 
exists. 

This  approach  to  algorithm  design  is  called  self-reducibility ,  and  has  been  formulated 
in  many  ways  in  the  literature.  In  its  most  limited  form,  an  assortment  of  restrictions 
are  placed  on  the  decision  algorithm,  its  input  and  the  lexicographic  position  of  the 
output  produced  (see,  e.g.,  [16]).  In  more  general  forms,  input/output  limitations  are 
eliminated  and  decision  algorithms  quite  distant  from  the  original  problem  are  permitted 
(see,  e.g.,  [6]).  Additional  variations  exist,  some  even  incorporating  randomness  or 
parallelism  (see,  e.g.,  [4,11]). 

It  is  not  difficult  to  see  that,  for  any  fixed  k  and  d,  Min  Degree  Graph  Partition 
is  self-reducible  in  polynomial  time.  That  is,  one  can  construct  a  satisfying  subset 
assignment,  if  any  exist,  with  at  most  a  polynomial  number  of  calls  to  a  decision 
algorithm,  known  from  the  last  section  also  to  run  in  polynomial  time. 

It  can  in  fact  be  self-reduced  with  only  a  linear  number  of  calls. 

Theorem  3.  For  any  fixed  k  and  d,  the  search  version  of  Min  Degree  Graph  Partition 
can  be  solved  in  0(np(n))  time,  where  p(n)  denotes  the  time  required  to  solve  the 
decision  version  of  the  problem. 

Proof  (sketch).  No  vertex  in  a  ‘yes’  instance  has  d  +  k  or  more  neighbors  (a  star 
with  d  +  k  rays  is  an  obstruction).  Furthermore,  in  such  an  instance,  there  must  exist 
some  satisfying  assignment  in  which  each  subset  induces  a  connected  subgraph.  From 
this  it  can  be  shown  that,  no  matter  the  rest  of  the  partition,  two  vertices  not  con¬ 
nected  by  a  sufficiently  short  path  need  never  share  the  same  subset.  Thus  we  know 
in  advance  that,  as  a  solution  is  recursively  constructed,  a  vertex  v  need  share  a  subset 
only  with  candidates  from  a  bounded-size  neighborhood.  Each  such  candidate,  u,  can 
be  tested  for  suitability  by  adding  d  - hi  copies  of  the  edge  uv ,  calling  the  decision 
algorithm  and  retaining  the  extra  edges  only  when  the  resulting  graph  is  also  a  ‘yes’ 
instance.  □ 
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A  number  of  interesting  self-reducibility  issues  remain  open  for  this  order,  though 
none  yet  are  perhaps  as  noteworthy  as  embedding  reducibilities  are  for  the  minor  order 
(where  the  permitted  operations  are  subgraph  and  edge  contraction).  For  example,  knot- 
lessness  [8]  is  decidable  in  polynomial  time,  though  it  is  not  known  to  be  searchable 
within  any  time  complexity  class. 


4.  Closure-preserving  operators 

In  the  case  of  a  ‘no’  instance,  some  sort  of  approximation  scheme  [9]  is  often 
required.  But  increasing  the  size  of  problem  parameters  may  not  be  desirable  or  even 
possible  in  many  settings.  An  approach  with  some  practical  appeal  then  is  to  ask 
instead  whether  one  can  modify  the  graph  (simplify  the  underlying  circuit)  so  that 
it  becomes  a  ‘yes’  instance.  More  generally,  we  seek  systematic  methods  for  making 
such  modifications  so  as  to  preserve  immersion  closure. 

Let  F  denote  a  family  of  graphs,  and  let  Fv(h)  denote  those  graphs  for  which  there 
exists  some  set  of  h  or  fewer  vertices  whose  removal  creates  a  graph  in  F.  When  h  is 
fixed,  recognizing  Fv(h)  can  of  course  be  reduced  to  recognizing  F  by  brute  force  in 
time  proportional  to  nh ,  a  polynomial.  If  F  is  minor-closed,  however,  there  is  a  more 
efficient  technique.  It  is  known  [5]  that  if  F  is  minor-closed,  then  so  is  Fv(h). 

Unfortunately,  this  operator  does  not  work  for  the  immersion  order.  To  see  this,  let 
F  denote  the  family  of  edgeless  graphs,  and  let  h  =  \.  The  star  graph  with  three  rays 
is  in  Ft,(l),  but  the  graph  obtained  by  lifting  a  pair  of  edges  yields  a  matching  of  size 
two,  which  is  not  in  Fv(  1). 

So  consider  edges  instead,  and  let  Fe(h)  denote  those  graphs  for  which  there  exists 
some  set  of  h  or  fewer  edges  whose  removal  creates  a  graph  in  F. 

Theorem  4.  For  any  fixed  h,  if  F  is  immersion-closed ,  then  so  is  Fe(h). 

This  operator,  plus  self-reducibility,  therefore  yields  a  polynomial-time  approach  for 
solving  the  decision  and  search  versions  of  Fe{h)  when,  for  example,  F  denotes  Min 
Degree  Graph  Partition.  Other  operators  exist,  but  this  is  perhaps  the  most  natural  from 
an  algorithmic  standpoint. 


5.  In  closing 

Much  is  known  about  complexity-theoretic  issues  for  subgraph,  topological  and  even 
minor  containment  [1].  In  contrast,  we  have  thus  far  really  only  scratched  the  surface  in 
understanding  some  of  the  range  and  depth  of  algorithmic  applications  of  the  immersion 
order.  Many  challenging  open  questions  beckon,  several  of  which  we  have  attempted 
to  illuminate  here. 
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Abstract 

Pathwidth  is  a  well-known  NP-complete  graph  metric.  We  present  a  technique  to  approximate  the  pathwidth  of  outerplanar 
graphs.  Although  a  polynomial-time  algorithm  is  already  known  to  determine  the  pathwidth  of  outerplanar  graphs,  this 
algorithm  is  not  practical.  Our  algorithm  works  in  O(nlogn)  time  on  graphs  of  order  n,  is  practical  and  produces  solutions 
at  most  three  times  the  optimum.  ©  1998  Elsevier  Science  B.V.  All  rights  reserved. 

Keywords:  Algorithms;  Pathwidth;  Outerplanar  graphs;  Approximation;  Tree  decomposition 


1.  Introduction 

Pathwidth  was  defined  by  Robertson  and  Seymour 
in  their  seminal  series  of  papers  on  Graph  Minors 
[11].  Since  then,  this  metric  has  found  application 
in  many  areas,  ranging  from  circuit  layout  to  natural 
language  processing  [6,10].  Determining  pathwidth 
is  NP-complete  [8].  Thus,  it  is  natural  to  search  for 
fast  approximation  algorithms.  No  polynomial-time 
relative  approximation  algorithm  (one  whose  solution 
is  within  a  multiplicative  constant  of  the  optimum) 
is  known  for  the  general  problem.  Moreover,  no 
polynomial-time  absolute  approximation  algorithm 
(one  whose  solution  is  within  an  additive  constant  of 
the  optimum)  can  exist  unless  P  =  NP  [3]. 

The  main  result  of  this  paper  is  a  practical  relative 
approximation  algorithm  for  the  pathwidth  problem 


*  This  research  is  supported  in  part  by  the  Office  of  Naval 
Research  under  contract  N00014-90-J-1855. 
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on  outerplanar  graphs.  Since  outerplanar  graphs  have 
treewidth  two  or  less,  the  methods  in  [1]  can,  in 
principle,  be  used  to  compute  the  pathwidth  exactly 
in  polynomial  time.  This  is  not  a  realistic  option, 
however,  because  of  the  high  degree  of  the  polynomial 
(more  than  four  times  the  treewidth).  In  contrast, 
our  algorithm  approximates  the  pathwidth  to  within  a 
factor  of  three  of  the  optimum  in  0(n  logrc)  time. 

2.  Our  approach 

We  consider  only  connected  graphs  without  loops 
or  multiple  edges. 1 

2.1.  Tree  and  path  decompositions 

A  tree  decomposition  of  a  graph  G  is  a  pair  (7,  Y), 
where  T  is  a  tree  and  Y  —  {7/  |  i  e  V(T)}  is  a 


1  Thus  an  edge  is  uniquely  specified  by  its  endpoints.  For 
example,  ah  denotes  an  edge  between  vertex  a  and  vertex  b. 
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collection  of  subsets  of  V  ( G )  such  that  (i)  for  each 
edge  e  e  E(G ),  some  7*  contains  both  endpoints  of 
e,  and  (ii)  for  all  i,  j,  k  e  V(T),  if  j  is  on  the  path 
between  i  and  k  in  7\  7,  H  Yk  c  Yj .  The  width  of  a  tree 
decomposition  ( T ,  Y)  is  one  less  than  the  size  of  the 
largest  set  in  Y.  The  treewidth  of  G  (denoted  tw{G)) 
is  the  smallest  width  of  all  its  tree  decompositions. 

A  path  decomposition  of  G  is  a  sequence  X\, . . . , 
Xr  of  subsets  of  V(G)  such  that  (i)  for  each  edge 
e  e  E(G),  some  X,-  contains  both  endpoints  of  e,  and 
(ii)  for  X\  fl  Xk  ^  Xj .  The  width  of 

a  path  decomposition  X\ , . . . ,  Xr  is  one  less  than  the 
size  of  the  largest  set  X;,  1  <  i  ^  r.  The  pathwidth  of 
G  (denoted  pw(G))  is  the  smallest  width  of  all  its  path 
decompositions. 

2.2.  A  conversion  procedure 

Path  decompositions  can  be  derived  from  tree  de¬ 
compositions.  We  employ  such  a  procedure,  td2pd, 
and  prove  its  correctness.  It  requires  a  routine  to  con¬ 
struct  optimal  path  decompositions  of  trees.  For  this, 
we  first  construct  a  layout  that  minimizes  vertex  sepa¬ 
ration  using  the  method  presented  in  [5],  and  then  con¬ 
vert  this  layout  into  a  path  decomposition  as  described 
in  [9]. 

Procedure  td2pd 

Input:  A  tree  decomposition  (T,  Y)  of  a  graph  G. 
Output:  A  path  decomposition  of  G. 

begin  procedure 

X\ , . . . ,  Xr  :=  an  optimal  path  decomposition  of  T ; 
for  1  ^  i  <  r  do 
Pr^UjexJj'i 
output  Pi, 

end  procedure 

Let  n  denote  the  order  of  T .  Using  the  results  of 
[5,9],  it  takes  O(nlogn)  time  to  construct  an  optimal 
path  decomposition  X\, . . . ,  Xr  of  T.  The  following 
properties  hold:  r  <  n,  and  for  1  ^  i  <  r,  |X,-|  is 
O(logrc).  Thus  td2pd  has  O(nlogn)  time  complexity 
as  long  as  the  input  tree  decomposition  has  bounded 
width.  This  is  true  for  outerplanar  graphs,  which  have 
treewidth  at  most  two. 

Theorem  1.  Let  (7,  Y)  denote  a  width-t  tree  decom¬ 
position  of  a  graph  G.  Then  td2pd((7,  7))  returns 


a  path  decomposition  of  G  with  width  no  more  than 
(r  +  l)(pw(7)  +  l)-l. 


Proof.  Let  Xi, . . . ,  Xr  denote  the  optimal  path  de¬ 
composition  of  T  constructed  in  td2pd,  and  let  Pi, 
. . . ,  Pr  denote  the  output  of  td2pd.  Then,  for  1  ^ 
i 


lfll  = 


Uyj 

jeXi 


Thus  the  width  condition  is  satisfied,  and  we  only  need 
to  check  that  Pi , . . . ,  Pr  is  a  valid  path  decomposition 
of  G. 

v-  It  is  easy  to  see  that  Pi , . . . ,  Pr  covers  all  edges 
in  G.  We  prove  by  contradiction  that  Pi , . . . ,  Pr  has 
the  intersection  property.  If  the  intersection  property 
does  not  hold,  then  for  some  1  ^  i  <  j  <  k  ^  r, 
there  is  a  vertex  v  in  Pi  D  Pk  that  is  not  in  Pj .  Since 
v  e  Pi  D  Pk,  there  must  exist  l  e  Xi  and  meXk ,  such 
that  v  belongs  to  7/  and  Ym.  Consider  the  subsets  V\ 
and  V2  of  V(T),  where 


y1=|J  Xp-Xj  and  V2=\Jxp-Xj. 

P<j  P>J 

The  intersection  property  of  Xi, . . . ,  Xr  implies  that 
V\  and  V2  are  disjoint.  Moreover,  there  is  no  edge 
in  T  connecting  V\  and  V2,  because  some  Xq  must 
contain  both  endpoints  of  such  an  edge,  contradicting 
the  disjointness  of  V\  and  V2.  Thus  every  path  between 
V\  and  V2  in  T  contains  a  vertex  from  Xj.  In 
particular,  the  path  between  l  and  m  must  contain  a 
vertex,  say  h,  from  Xj .  By  the  intersection  property  of 
(: T ,  7),  v  e  Yh.  Since  h  e  Xj ,  7/,  c  P;  and  v  e  Pj ,  a 
contradiction.  □ 


3.  Path  decompositions  of  outerplanar  graphs 

A  graph  is  outerplanar  if  it  has  a  planar  embedding 
with  all  vertices  lying  in  a  single  face.  Outerplanar 
graphs  have  treewidth  at  most  two.  In  this  section,  we 
develop  an  algorithm  that,  for  an  outerplanar  graph  G, 
constructs  an  optimal  tree  decomposition  (T,  7)  with 
pw(T)  ^  pw(G).  By  Theorem  1,  running  td2pd  on 
(T,  7)  produces  a  path  decomposition  with  width  at 
most  3  x  pw(G)  +  2. 

We  say  that  a  tree  decomposition  (T,  7)  is  simple 
if  (7,  7)  has  width  at  most  two,  T  is  a  subgraph 
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of  G,  and  d  €  Yv  for  all  v  G  V(T).  Because  the 
pathwidth  of  a  subgraph  of  G  cannot  be  greater  than 
the  pathwidth  of  G,  if  (7\  Y )  is  simple,  then pw(T)  ^ 
pw(G).  Our  algorithm  constructs  (7\  Y)  by  combining 
tree  decompositions  of  G’s  subgraphs  as  described  in 
the  following  lemma. 

Lemma  2.  Let  (T\  Yf)  and  ( T ",  Y'f)  denote  tree 
decompositions  of  graphs  G'  and  G",  respectively. 
Suppose  V(Tf)  and  V(T")  are  disjoint,  and  there 
are  vertices  u  G  V(Tr)  and  v  G  V(Tff)  such  that  all 
vertices  in  V(G')  fl  V(G' ')  are  in  both  Yfu  and  Yf. 
Then  we  may  obtain  a  tree  decomposition  (T,Y)  of 
G  =  G'  U  G”  by  setting  T  =  V  U  T"  U  {uv}  and 
Y  =  Yf  U  Y". 2  Moreover,  ( T ,  Y)  is  simple  if  (Tr ,  F') 
and  (T”,  Y" )  are  simple  and  uv  G  E(G). 

Proof.  Since  (T\  Yf)  covers  all  edges  in  Gf  and 
(T'\  Y ")  covers  all  edges  in  G" ,  ( T ,  F)  covers  all 
edges  in  G.  To  verify  that  (T,  Y)  has  the  intersection 
property,  let  i,  j  and  k  be  vertices  in  T  such  that  j  is 
on  the  path  between  i  and  k.  We  need  to  show  that 
Yi  nF*  C  Yj .  This  follows  immediately  if  both  i  and  k 
belong  to  Tf  or  both  belong  to  T” .  So  assume,  without 
loss  of  generality,  that  i  e  V  (Tr)  and  k  G  V  ( T ").  Note 
that  the  path  between  i  and  k  contains  both  u  and 
v.  If  j  G  V(Tf),  then  by  the  intersection  property  of 
(T\  F'), 

Yi  n  Yk  C  Yi  fl  Yu  -Y-DY^  C  Y'j  =  Yj. 

If  j  G  V(T"),  then  by  the  intersection  property  of 
F"), 

Yi  C\Yk  C.YvDYk  =  Yf  fl  Yf  C  Yf  =  Yj.  □ 

3.1.  Biconnected  outerplanar  graphs 

We  concentrate  initially  on  biconnected  graphs 
(those  without  cut  points). 

Lemma  3.  Let  G  be  biconnected,  outerplanar  and  of 
order  at  least  three.  Let  v  denote  an  arbitrary  vertex  in 


2  T'  U  T"  U  {hu}  denotes  the  tree  with  vertex  set  V (Tr)  U  V (T") 
and  edge  set  E(T ')  U  E(T")  U  {hi;}-  Y  =  Y'  U  Y"  denotes  the 
collection  {Yv  \  v  e  V(Tf)  U  V(T") },  where  Yv  equals  either  Yrv 
or  Y'f  depending  on  whether  v  €  V(Tf)  or  v  €  V(T"). 


G.  Then  G  contains  a  path  3  P  with  at  least  two  edges 
such  that  the  following  conditions  hold: 

•  all  vertices  in  P,  except  possibly  its  endpoints,  have 
degree  two  in  G, 

•  the  endpoints  of  P  are  adjacent  in  G,  and 

•  v  is  either  an  endpoint  of  P  or  not  in  P. 

Proof.  Fix  an  outerplanar  layout  of  G.  Since  G  is 
biconnected,  the  outer  face  of  this  layout  defines  a 
Hamiltonian  circuit  in  G.  Let  I  denote  the  set  of 
internal  edges  of  G  (those  not  on  the  external  face). 
The  proof  proceeds  by  induction  on  |/|.  For  the  basis 
case,  |  / 1  =  0,  G  is  a  cycle  and  the  lemma  is  satisfied  by 
setting  P  to  G  —  {uv},  where  u  is  an  arbitrary  vertex 
adjacent  to  v .  For  the  induction  hypothesis,  assume  the 
lemma  for  |/|  =  i  ^  0.  For  the  induction  step,  consider 
the  case  in  which  1 7 1  =  i  + 1 .  Then  G  can  be  expressed 
as  the  union  of  two  smaller  biconnected  outerplanar 
graphs,  G'  and  G ",  each  of  order  at  least  three,  that 
share  only  two  vertices,  a  and  b ,  and  the  edge  ab. 
Assume,  without  loss  of  generality,  that  v  G  V  (Gn) 
and  that  v  #  b.  By  hypothesis,  Gf  contains  a  path,  P\ 
all  of  whose  non-endpoint  vertices  have  degree  two, 
whose  endpoints  are  adjacent,  and  that  excludes  v'  = 
a  as  a  non-endpoint  vertex.  Since  v  <£  V(G')  —  {fl}, 
Pf  cannot  contain  v  as  a  non-endpoint  vertex.  Thus 
setting  P  —  P'  satisfies  the  lemma.  □ 

Fig.  1  illustrates  Lemma  3,  with  vertex  9  playing 
the  role  of  v.  Both  paths  marked  with  dashed  edges 
satisfy  the  lemma.  (The  path  7-8-9-10  does  not  satisfy 
the  lemma,  because  it  contains  u  as  a  non-endpoint 
vertex.) 


3  4  5 


3  If  tq , . . . ,  vk  is  a  sequence  of  distinct  vertices  in  G  such  that  u; 
is  adjacent  to  ty+i  for  1  <  *  <  k,  then  the  subgraph  H  of  G  with 
V(H )  =  {ui,...,  a*}  and  E(H)  =  {v/Vj+i  |  1  <  i  <  k}  is  said  to  be 
a  path  in  G. 
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If  a  path  P  contains  at  least  two  edges  and  has 
endpoints  w  and  x,  then  it  has  a  width-two  tree 
decomposition  (7,  7)  such  that  T  =  P  —  {w,  x}  and 
for  i  €  V(7),  Yi  =  {x,  i, ;},  where  j  is  the  neighbor 
of  i  on  w’s  side  (the  sets  7,  actually  form  a  path 
decomposition  of  P).  We  call  (7,  Y)  a  w -extensible 
tree  decomposition  of  P .  Fig.  2  shows  a  path  and  its  w- 
extensible  tree  decomposition.  The  sets  7/  are  shown 
inside  the  ovals. 

Note  that  for  every  edge  i j  e  E(P),  either  {/,  j]  c 
7/  or  {/,  j}  c  7 j .  We  use  the  notion  of  extensibil¬ 
ity  to  derive  bc-op-td,  our  algorithm4  to  construct 
simple  tree  decompositions  of  biconnected  outerpla- 
nar  graphs. 

Procedure  bc-op-td 

Input:  A  biconnected  outerplanar  graph  G  of  order 
two  or  more,  and  a  vertex  u  in  G. 

Output:  A  simple  tree  decomposition  (7,  7)  of  G, 
with  7  spanning  G  —  {v}. 
begin  procedure 
if  |V(G)|  =  2 
then  begin 

u  :=  the  vertex  adjacent  to  v; 

Yu:={u,v},  7  :={«},  Y:={YU}; 

end; 

else  begin 

P  :=  a  path,  between  some  two  vertices  w 

and  x ,  that  satisfies  Lemma  3; 

(7\  7')  :=  bc-op-td (G  -  (P  -  { w ,  x}),  u); 

if  {w,x}^Yfw 

then  begin 

e  :=  the  edge  incident  on  w  in  P; 

(7",  7")  :=  the  ^-extensible  tree 
decomposition  of  P; 

end; 

4  The  recursive  call  in  this  algorithm  uses  graph  difference, 
whereby  G\  -  G2  denotes  the  subgraph  of  G\  induced  by  V(G\)  - 
V(G2). 


else  begin 

e  :=  the  edge  incident  on  a:  in  P; 

(7",  Y")  :=  the  x-extensible  tree 
decomposition  of  P ; 

end; 

7  :=  r  U  Tn  U  {e},  7  :=  7'  U  7"; 

end; 

output  (7,  7); 

end  procedure 

At  this  point,  we  may  as  well  assume  that  v  is 
chosen  at  random.  A  specific  choice  of  v  is  necessary 
when  G  is  a  biconnected  component  of  a  larger  graph 
(see  Section  3.3). 

Lemma  4.  Let  G  be  biconnected  and  outerplanar, 
and  let  v  denote  a  vertex  in  G.  Let  (7,  7)  denote  the 
result  of  the  call  to  bc-op-td(G,  u).  Then  (7,  7)  is  a 
simple  tree  decomposition  of  G ,  and  7  is  a  spanning 
tree  ofG  —  {v}. 

Proof.  We  prove,  using  induction  on  \E(G)\,  a  some¬ 
what  stronger  result.  We  show  that  (7,  7)  is  simple, 
that  7  spans  G  -  M,  and  that  for  each  edge  ij  in 
G,  either  i  e  V(T)  with  {/, ;}  c  7/  or  j  e  V(T)  with 
{/,  j]  C  Yj.  The  lemma  holds  for  the  basis  case,  in 
which  G  contains  just  one  edge.  If  \E(G)\  >  1,  let 
P,  with  endpoints  w  and  x,  denote  a  path  that  sat¬ 
isfies  Lemma  3.  Let  G'  denote  G  -  (P  —  {w,x}). 
Thus  v  is  in  G'.  By  the  induction  hypothesis,  bc-op- 
td(G',  v)  returns  a  simple  tree  decomposition  (7',  Yr), 
with  Tf  spanning  G'  —  {u},  and  with  {w,  x]  c  Y’w  or 
{w,  x }  c  Yfx.  Assume,  without  loss  of  generality,  that 
{w,x}  C  Yfw.  Let  (7",  7")  denote  the  u; -extensible 
tree  decomposition  of  P.  Then  Tn  =  P  —  {w,x}  and 
[w,x]  c  7",  where  a  is  w’ s  neighbor  in  P.  T  is 
formed  by  adding  an  edge  between  vertex  w  in  T' 
and  vertex  a  in  7".  The  only  vertices  common  to  G' 
and  P  are  w  and  x,  which  are  contained  in  both  Y'w 
and  7".  Therefore  (7,  7)  is  a  valid  tree  decomposition 
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of  G.  (F,  Y)  is  simple  because  (F',  Y')  and  (F",  Y") 
are  simple,  with  the  edge  wa  existing  in  G.  T'  spans 
G  —  {u}  because  F'  spans  G'  —  {u}  and  T"  spans 
P  -  {w,x}.  To  complete  the  induction,  observe  that 
for  each  edge  i j  e  E(G ),  {/,  j]  C  Yi  or  {/,  j]  c  Y j ,  be¬ 
cause  either  {i,  j]  =  {w,  a}  c  y"  or  {i, ;}  is  contained 
in  one  of  Y[,  Fj,  Y"  and  Tj'.  □ 

3.2.  Time  complexity 

Let  G,  of  order  n,  and  v  denote  the  input  to  bc- 
op-td.  We  store  G  in  doubly-linked  adjacency  list 
format.  This  is  space-efficient,  because  G  can  have  at 
most  2n  —  3  edges.  We  also  employ  a  few  additional 
links.  To  facilitate  the  removal  of  an  edge  ab,  links 
are  maintained  between  the  copy  of  b  in  a’s  adjacency 
list  and  the  copy  of  a  in  V s  adjacency  list.  The  only 
steps  in  bc-op-td  that  take  more  than  constant  time  are 
(i)  finding  a  path  P  that  satisfies  Lemma  3,  (ii)  deleting 
the  edges  and  non-endpoint  vertices  of  P  from  G  and 
(iii)  constructing  an  extensible  tree  decomposition  of 
P.  Of  these,  steps  (ii)  and  (iii)  take  at  most  linear 
time  over  all  calls  to  bc-op-td.  Thus  the  question  of 
efficiency  reduces  to  the  implementation  of  step  (i). 
One  fast  method  is  described  below. 

Some  preprocessing  is  required.  We  first  construct 
an  outerplanar  layout  of  G  and  find  the  Hamiltonian 
circuit  that  constitutes  its  external  face  [4].  We  scan 
the  layout  in  a  clockwise  direction,  starting  at  v, 
and  number  vertices  in  the  order  in  which  they  are 
encountered.  Then  we  rearrange  the  adjacency  list 
of  each  vertex  so  that  the  elements  in  the  list  are 
numbered  in  ascending  order.  (An  efficient  means  to 
do  this  visits  the  vertices  in  order;  for  each  vertex  a 
and  each  vertex  bina’s  adjacency  list,  we  insert  a  into 
Z?’s  adjacency  list  in  a  new  structure.)  Each  of  these 
tasks  takes  only  linear  time. 

Once  preprocessing  is  completed,  paths  to  play  the 
role  of  P  are  found  during  a  second  clockwise  scan 
and  calls  to  procedure  find-path.  For  1  ^  i  ^  n,  let  u; 
denote  the  vertex  numbered  i.  A  variable  k,  initialized 
to  1,  stores  the  number  of  the  last  vertex  scanned. 
During  each  call  to  find-path,  the  scan  starts  from  this 
vertex  and  continues  until  a  satisfactory  path  is  found. 
As  paths  are  found,  non-endpoint  vertices  are  removed 
(by  bc-op-td),  but  the  original  vertex  numbering  is 
maintained.  A  stack  vs  tack,  initially  empty,  holds 
previously  scanned  vertices  of  degree  three  or  more 


(these  vertices  are  candidates  for  the  endpoints  of 
paths).  We  say  that  two  vertices  are  internal  neighbors 
if  they  are  connected  by  an  internal  edge. 

Procedure  find-path 

Input:  A  biconnected  outerplanar  graph  G  of  order 
three  or  more  with  V(G)  C  {vi,  a 

stack  vstack  of  vertices,  and  an  integer  k. 
Output:  A  path  P  that  satisfies  Lemma  3,  with  v\ 
playing  the  role  of  v. 
begin  procedure 
if  G  is  a  cycle 
then  P  :=  G  —  {uiv„}; 
else  begin 

while  Vk  has  no  lower-numbered  internal 
neighbor 

do  begin 

if  8(vk)  ^  3  then  push  v*  on  to  vstack ; 
k  :=  k  +  1; 

end; 

vj  the  vertex  on  top  of  vstack; 

P  :=  the  path  on  the  external  face  from  vj  to  Vk\ 
if  8(vj)  =  3  then  pop  vj  from  vstack; 

end; 

output  P; 
end  procedure 

Lemma  5.  A  path  returned  by  find-path  satisfies 
Lemma  3. 

Proof.  Let  G,  P,  j  and  k  be  as  defined  in  find- 
path.  The  lemma  holds  immediately  if  G  is  a  cycle, 
so  assume  G  contains  one  or  more  internal  edges. 
Since  1  ^  j  <  k,  P  does  not  contain  v\  as  a  non¬ 
endpoint  vertex.  Let  va  denote  the  lowest-numbered 
internal  neighbor  of  u*  .  Clearly  a  <  k.  Let  Vb  denote 
the  highest-numbered  internal  neighbor  of  Vj .  Since 
Vj  was  on  the  stack,  it  had  no  lower-numbered  internal 
neighbors  when  Vk  was  scanned,  and  so  j  <  b.  For 
Vi,  a  non-endpoint  vertex  of  P,  6(i>;)  =  2,  else  u, 
would  either  have  been  pushed  on  vstack  after  vj ,  or  a 
path  connecting  vj  and  Vi  would  already  have  been 
returned.  Thus  a  <  j  and  k  <  b.  It  cannot  be  that 
both  a  <  j  and  k  <  b,  because  otherwise  the  edges 
vaVk  and  VjVb  would  intersect,  which  is  impossible  in 
an  outerplanar  layout.  So  either  a  =  j  or  k  —  b,  and 
the  internal  edge  VjVk  must  exist.  Thus  P  satisfies 
all  three  properties  required  by  Lemma  3:  its  non- 
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endpoint  vertices  all  have  degree  two,  its  endpoints  are 
adjacent,  and  it  excludes  v\  (which  plays  the  role  of  u) 
as  a  non-endpoint  vertex.  □ 

The  total  amount  of  work  done  by  find-path  is 
linear  in  n.  To  see  this,  observe  that  the  number  of 
iterations  of  the  while  loop,  summed  over  all  calls  to 
find-path,  is  at  most  n.  Thus  each  statement  in  find- 
path  is  executed  O(n)  times.  The  only  statement  that 
takes  more  than  constant  time  is  the  assignment  of  a 
value  to  P ,  which  takes  0(|V(P)|)  time.  Since  the 
order  of  G  decreases  by  0(|V(P)|)  after  each  such 
assignment,  the  total  amount  of  work  done  during  this 
operation  is  also  O(n).  It  follows  that  bc-op-td  has 
linear  complexity. 

3.3.  Tackling  general  outerplanar  graphs 

We  now  generalize  our  algorithm  to  handle  all 
outerplanar  graphs. 

Procedure  op-td 

Input:  An  outerplanar  graph  G  of  order  two  or 
more,  and  sets  B  and  C  of  its  biconnected 
components  and  cut  points. 

Output:  A  simple  tree  decomposition  ( T ,  Y)  of  G, 
with  T  spanning  G. 
begin  procedure 
if  G  is  biconnected 
then  begin 

m,  v  :=  any  two  adjacent  vertices  in  G; 

(r\  Yf)  :=  bc-op-td(G,  v ); 

Yv  =  M,  T:=T'U  {v}  U  { uv },  Y  :=  Y'  U  {Yv}; 

end; 

else  begin 

Bi  :=  an  element  of  B  that  contains  exactly 
one  vertex  v  from  C; 

if  v  is  not  a  cut  point  in  G  —  (Bi  —  {i>})  then 
C  :—C  —  {v}; 

(T\  Y')  :=  bc-op-td (2?/,  v); 

(T\  Y")  := 

op-td (G  -  (Bi  -  M),  B  -  {*/},  C); 
u  :=  an  arbitrary  neighbor  of  v  in  Bi ; 

T  :=  T'  U  T"  U  {uv},  Y  :=Y'  U  Y"; 

end; 

output  (T,  Y ); 

end  procedure 


Lemma  6.  Let  G  be  outerplanar.  Let  (T,  Y)  denote 
the  result  of  the  call  to  op-td(G).  Then  (T,  Y)  is  a 
simple  spanning  tree  decomposition  of  G. 

Proof.  The  proof  proceeds  by  induction  on  the  num¬ 
ber  of  biconnected  components  of  G.  The  basis  case, 
when  G  is  biconnected,  follows  from  Lemma  4  and 
the  modifications  made  to  (T,  Y)  after  the  call  to  bc- 
op-td(G,  v).  So  let  Bi,v,  (T\  7'),  (T\  Y")  and  u  be 
as  defined  in  rop-td.  Let  G  denote  G  -  (Bi  —  M). 
From  the  proof  of  Lemma  4,  we  know  that  (T\  Y') 
is  simple,  that  it  spans  Bi  —  {t>},  and  that  {u,  u}  C  Y’u 
(there  is  no  Yfv).  By  the  induction  hypothesis,  (T/f,  Y") 
is  a  simple  spanning  tree  decomposition  of  G.  Thus, 
by  construction,  (T,  7)  is  a  simple  spanning  tree  de¬ 
composition  of  G.  □ 

Biconnected  components  and  cut  points  can  be 
found  using  a  depth-first  search.  Procedure  op-td 
builds  an  optimal  tree-decomposition  using  bc-op- 
td.  This  tree  decomposition  is  converted  into  a  path 
decomposition  using  td2pd.  Recalling  Theorem  1,  and 
noting  that  td2pd  is  asymptotically  the  slowest  of  the 
aforementioned  steps,  we  achieve  the  following  main 
result. 

Theorem  7.  If  G  is  an  outerplanar  graph  of  order 
n,  a  path  decomposition  of  G  with  width  at  most 
3  x  pw(G)  +  2  can  be  constructed  in  0(n\ogn)  time. 

We  strongly  suspect  that  this  bound  can  be  reduced 
to  0(n).  After  all,  we  have  gone  to  some  effort 
here  to  devise  linear- time  techniques.  Only  computing 
the  path  decomposition  of  a  tree  uses  super-linear 
time.  We  are  aware  that  claims  of  linear-time  path 
decomposition  algorithms  for  trees  have  appeared  in 
the  literature;  but  none  to  our  knowledge  has  a  credible 
level  of  detail,  such  as  that  found  in  [5]. 

4.  Concluding  remarks 

We  have  implemented  our  algorithm  in  the  C 
programming  language.  Tests  on  a  SPARC  ULTRA 
indicate  that  the  implementation  is  fast  in  practice, 
taking,  for  instance,  less  than  two  seconds  to  compute 
the  path  decomposition  of  a  graph  with  ten  thousand 
vertices.  It  is  difficult  to  gauge  the  quality  of  the 
solutions  produced,  because  there  is  no  practical  way 
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to  obtain  optimal  path  decompositions  for  comparison. 
As  a  compromise,  we  tested  the  program  on  pseudo¬ 
random  outerplanar  graphs  of  known  pathwidth.  These 
tests  indicate  that  the  approximate  decompositions 
tend  to  have  much  smaller  width  than  the  worst  case 
guarantee. 

Our  work  has  exploited  the  fact  that  if  the  width 
of  a  tree  decomposition  ( T ,  F)  of  G  is  bounded, 
and  if  pw(T)  is  within  some  constant  multiple  of 
pw(G),  then  we  can  construct  a  path  decomposition 
of  G  whose  width  is  at  most  a  constant  times  pw(G). 
Series-parallel  graphs  also  have  treewidth  at  most 
two.  Optimal  tree  decompositions  for  them  can  be 
constructed  quickly.  We  believe  that,  for  these  graphs, 
it  is  possible  to  ensure  pw(T)  ^  2 pw(G),  yielding  a 
factor-of-six  relative  approximation  algorithm. 

On  a  more  general  note,  we  conjecture  that  any 
graph  G  has  an  optimal  tree  decomposition  ( T ,  Y) 
such  that  pw(T)  ^  pw(G).  If  true,  a  constructive 
proof  of  this  would  provide  a  relative  approximation 
algorithm  for  any  class  of  graphs  whose  bounded- 
width  tree  decompositions  can  be  found  efficiently. 
Currently,  this  class  includes  all  graphs  of  treewidth 
four  or  less,  Halin  graphs  and,  for  any  fixed  k,  k- 
chordal  graphs,  & -outerplanar  graphs  and  graphs  with 
disk  dimension  k ,  to  name  just  a  few  [2,6,7]. 
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Abstract 

Huang,  B.-C.  and  M.A.  Langston,  Stable  set  and  multiset  operations  in  optimal  time  and  space,  Information  Processing 
Letters  39  (1991)  131-136. 

We  devise  time-space  optimal  methods  for  stably  performing  set  and  multiset  operations  on  sorted  files  of  data.  For  the  sake 
of  complete  generality,  our  techniques  neither  modify  records  nor  require  any  information  other  than  a  record’s  key. 

Keywords'.  Analysis  of  algorithms,  computational  complexity,  data  management 


1.  Introduction 

Rearranging  the  sequence  of  records  within  a 
file  based  on  the  relative  value  of  each  record’s 
key  is  an  operation  of  fundamental  importance  to 
computer  science.  Common  examples  of  this  type 
of  operation  include  sorting  a  random  file,  extract¬ 
ing  duplicates  from  a  sorted  file,  merging  two  or 
more  sorted  subfiles,  and  many  others.  It  is  often 
desirable  that  such  an  operation  be  stable ,  by 
which  we  mean  that  records  with  equal  keys  retain 
their  original  relative  order. 


*  A  preliminary  version  of  this  paper  was  presented  at  the 
Seventh  ACM  SIGACT-SIGMOD-SIGART  Symposium  on 
Principles  of  Database  Systems  held  in  Austin,  Texas,  in 
March,  1988.  This  research  has  been  supported  in  part  by 
the  National  Science  Foundation  under  grant  MIP-8919312 
and  by  the  Office  of  Naval  Research  under  contract 
N00014-88-K-0343. 


Performing  such  an  operation  in  the  optimal 
amount  of  time  (to  within  a  constant  factor)  is 
usually  not  very  difficult,  but  the  obvious  methods 
incur  a  considerable  cost  in  the  overhead  associ¬ 
ated  with  additional  temporary  storage  to  be  used 
by  the  fast  rearrangement  algorithm.  On  the  other 
hand,  slower  methods  typically  exist  for  perfor¬ 
ming  such  an  operation  in-place,  where  a  few 
extra  storage  cells  aid  the  rearrangement  process, 
but  whose  total  number  is  constant,  independent 
of  the  file’s  size. 

The  design  and  analysis  of  time-space  optimal 
file  rearrangement  algorithms,  those  that  achieve 
lower  bounds  on  both  time  and  extra  space 
simultaneously,  is  an  appealing  area  of  research, 
so  far  largely  unexplored.  Through  the  work  of  [8], 
it  is  known  that  stable  merging  (and,  hence,  stable 
sorting  by  merging)  permits  such  a  method.  More 
recently,  the  authors  have  shown  that  stable 
duplicate-key  extraction  does  as  well  [2]. 
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The  goal  of  this  brief  paper  is  to  prove  the 
existence  of  time-space  optimal  methods  for  the 
elementary  binary  set  and  multiset  operations  1  on 
sorted  files.  In  the  next  section,  we  introduce  some 
required  notation  and  define  a  few  useful,  primi¬ 
tive  suboperations.  Section  3  contains  our  presen¬ 
tation  of  a  time-space  optimal  selection  scheme,  in 
which  matched  records  are  selected  from  a  sorted 
file.  Primarily  through  the  use  of  this  selection 
strategy,  we  prove  in  Section  4  that  each  set  oper¬ 
ation  can  be  performed  on  sorted  files  of  data  in 
optimal  time  and  space.  In  Section  5,  we  gener¬ 
alize  these  operations  to  multisets,  with  special 
attention  to  file  processing  applications,  and  show 
that  each  can  again  be  performed  in  optimal  time 
and  space.  The  final  section  of  this  paper  contains 
a  few  remarks  pertinent  to  the  future  study  of  this 
general  topic,  and  provides  upper  bounds  on  the 
constants  of  proportionality  of  our  methods. 

2.  Notation,  definitions  and  other  relevant  pre¬ 
liminaries 

Let  L  denote  a  list  (internal  file)  of  n  records, 
indexed  from  1  to  n.  We  use  KEY(i)  as  a 
shorthand  to  denote  the  key  of  the  record  with 
index  i.  Only  the  two  common  0(1)  time  and 
space  primitive  operations  are  assumed,  namely, 
record  exchanges  and  key  comparisons.  The  ex¬ 
change  procedure,  SWAP(i,  j),  directs  that  the 
ith  and  yth  records  are  to  be  exchanged.  The 
comparison  functions,  for  example  KEY(i)  < 
KEY(j ),  return  the  expected  Boolean  values  de¬ 
pendent  on  the  relative  values  of  the  keys  being 
compared. 

From  these  primitive  operations,  we  construct  a 
few  0(1)  space  useful  subprograms  for  dealing 
with  blocks.  Let  us  define  a  block  to  be  a  set  of 
records  from  L  with  consecutive  indices.  The  head 
of  a  block  is  the  record  with  the  lowest  index  (or, 
informally,  the  “leftmost”  record  in  the  block). 
The  procedure  BLOCKSWAP(i ,  j\  h)  exchanges 
a  block  of  h  records  beginning  at  index  i  with  a 

1  These  operations  are  commonly  defined  to  be  union,  inter¬ 
section,  difference  (relative  complement)  and  exclusive  OR 

(symmetric  difference). 


block  of  h  records  beginning  at  index  j  in  O (h) 
time.  We  specify  that  blocks  do  not  partially  over¬ 
lap  (i.e.,  if  i  #  j,  then  h  <  |  i  -j  |)  and  that,  when 
BLOCKSWAP  is  finished,  records  within  a  moved 
block  retain  the  order  they  possessed  before 
BLOCKSWAP  was  invoked.  A  block  of  h  records 
beginning  at  index  /  is  sorted  in  nondecreasing 
order  by  the  procedure  SORT(i ,  h).  The  proce¬ 
dure  BLOCKSORT(i ,  h ,  p)  uses  BLOCKSWAP 
to  rearrange  the  p  consecutive  blocks,  each  with  h 
records,  beginning  at  index  i  so  that  their  heads 
are  sorted  in  nondecreasing  order.  To  reduce  un¬ 
necessary  record  movement,  we  assume  SORT 
and  BLOCKSORT  use  a  straight  selection  sort 
[4],  yielding  respective  time  complexities  O (h2) 
and  O  (p2  +  ph).  Finally,  the  procedure 
ROTATE (i,  h,  /)  rotates  (circularly  shifts)  a  block 
of  h  records,  beginning  at  index  /,  /  places  to  the 
left.  We  assume  ROTATE  is  implemented  in  the 
common  fashion  with  three  sublist  reversals, 
thereby  requiring  no  more  than  h  invocations  of 
SWAP. 


3.  Selection:  A  fundamental  file  processing  tool 

Suppose  L  contains  two  sorted  sublists,  Lx  and 
L2,  with  respective  sizes  nx  and  n2,  where  nx  +  n2 
=  n.  We  seek  to  transform  stably  Lx  into  two 
sorted  sublists  L3  and  L4,  where  L3  contains  the 
records  whose  keys  are  not  found  in  L2 ,  and  L4 
contains  those  whose  keys  are.  Thus  we  accept 
L  =  LXL2,  and  SELECT  keys  from  Lx  that  are 
contained  in  L2,  and  accumulate  them  in  L4, 
where  our  output  is  of  the  form  L3L4L2 . 

Any  procedure  for  performing  SELECT  re¬ 
quires  £2(«)  time  since  this  problem  is  at  least  as 
difficult  as  the  task  of  verifying  whether  Lx  and 
L2  contain  the  same  keys,  which  needs  £2(«)  com¬ 
parisons.  Similarly,  any  procedure  for  SELECT 
trivially  requires  £2(1)  extra  space.  Unfortunately, 
the  obvious  methods  for  attaining  either  lower 
bound  alone  violate  the  other  bound. 

However,  we  shall  now  proceed  to  prove  that 
both  lower  bounds  can  be  simultaneously  achieved. 
In  order  to  do  so,  we  shall  employ  the  concepts  of 
internal  buffering  and  block  rearranging.  This  gen¬ 
eral  type  of  approach,  in  which  0(Jn )  blocks  each 
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of  size  O (yfn )  are  exchanged,  can  be  traced  back 
to  the  seminal  work  on  unstable  merging  de¬ 
scribed  in  [5]. 

We  note  that,  as  with  the  other  results  proved 
here,  it  is  sufficient  to  consider  only  sorted  inputs. 
That  is,  it  is  easy  to  show  that  any  algorithm  for 
performing  SELECT  when  Lx  and  L2  are  un¬ 
sorted  needs  log  n)  time,  because  otherwise  a 
faster  algorithm  could  be  employed  to  solve  set 
equality,  contradicting  the  main  result  of  [7].  Thus 
dealing  with  unsorted  inputs  is  not  an  issue,  since 
we  know  from  [8]  that  we  can  stably  sort  in 
O (n  log  n)  time  and  0(1)  extra  space. 

Theorem  1.  SELECT  can  be  stably  performed 
simultaneously  in  linear  time  and  constant  extra 
space . 

Proof.  Our  proof  is  by  means  of  algorithm  con¬ 
struction,  emphasizing  simplicity  of  exposition 
rather  than  efficiency  of  implementation.  We  shall 
henceforth  use  the  term  L3  record  (L4  record)  to 
denote  a  record  initially  in  Lx  that  is  to  go  to  L3 
(L4).  Similarly,  the  term  L3  block  (L4  block)  shall 
be  used  to  refer  to  a  block  containing  only  L3 
(L4)  records. 

Step  1.  ( BUFFERFILL ) 

Let  5  =  [)Jn^ J.  We  first  attempt  to  fill  an  inter¬ 
nal  buffer  of  size  j  with  records  having  the  (first 
copy  of  the)  5  smallest  distinct  keys  that  are  to  go 
to  L4.  Thus  we  seek  to  convert  Lx  into  the  form 
ABC ,  where  B  is  the  buffer  (whose  contents  need 
be  in  no  special  order),  A  results  from  obtaining 
j 9,  and  C  is  a  suffix  of  Lx  whose  records  have  not 
been  disturbed.  We  construct  B  by  conducting  a 
left-to-right  scan  of  Lx  in  conjunction  with  a 
left-to-right  scan  of  L2.  When  a  comparison  of 
adjacent  keys  in  Lx  reveals  that  a  record  R  with  a 
new  distinct  key  has  been  found,  and  when  L2 
indicates  that  R  is  also  to  go  to  L4,  then  we 
coalesce  R  into  B.  That  is,  we  increase  the  size  of 
B  by  1,  and  advance  the  scan  to  the  record  to  the 
immediate  right  of  R.  Otherwise,  we  exchange  R 
with  the  current  leftmost  element  of  B ,  thereby 
shifting  B  one  position  to  the  right  and  leaving  its 
size  unchanged.  Therefore,  we  “roll”  B  toward  the 


right  in  an  attempt  to  load  it  with  s  records 
having  distinct  keys. 

If  Lx  is  exhausted  before  B  is  filled,  then  we  go 
immediately  to  Step  4.  Otherwise,  as  soon  as  B 
has  grown  to  size  s ,  we  ROTATE  the  sublist  BC 
to  yield  the  list  ACBL2 . 

Since  only  B  has  become  disordered,  and  since 
B  contains  only  distinct  keys,  sorting  B  in  a 
subsequent  step  will  ensure  stability.  Clearly,  0(n) 
time  and  0(1)  space  suffice  for  the  first  step. 

Step  2.  ( BLOCKIFY ) 

Our  next  step  is  the  most  complex.  Let  n3  (nA) 
denote  the  number  of  records  in  Lx  that  are  to  go 
to  L3  (L4),  where  nx  =  n3  +  n4.  (We  assume  that 
n3  and  nA  were  precomputed  with  a  linear  scan  of 
Lx  and  L2  and  that,  without  loss  of  generality, 
both  quantities  are  nonzero.)  Thus  n3  =  st3  +  e3 
and  nA  =  stA  +  eA  for  unique  nonnegative  integers 
t3,  e3 ,  tA)  and  e4,  where  e3,  e4  <  s.  We  now  seek  to 
use  B  to  transform  A  CB  into  a  list  of  t3  +  tA  +  2 
blocks,  the  first  a  (possibly  empty)  block  of  size 
e3 ,  followed  by  t3  +  tA  blocks  of  size  j,  followed 
by  one  last  (possibly  empty)  block  of  size  eA, 
where  the  first  t3  + 1  blocks  contain  L3  and  the 
next  tA  +  1  blocks  contain  LA. 

Viewing  ACB  in  this  form,  we  let  the  rightmost 
nonempty  block  (which  has  size  eA  or,  if  eA  =  0, 
size  s  and  which  is  now  occupied  by  B  or  a  part  of 
B)  become  an  LA  block.  The  block  to  its  im¬ 
mediate  left  will  become  an  L3  block.  (Recall  that 
both  n3  and  nA  exceed  zero,  so  that  these  two 
blocks  are  needed.)  We  now  begin  scanning  AC 
and  L2  from  right  to  left.  When  we  find  an  LA 
record,  we  exchange  it  with  the  rightmost  buffer 
element  of  the  current  LA  block.  For  an  L3  rec¬ 
ord,  we  exchange  it  with  the  rightmost  buffer 
element  of  the  current  L3  block  (unless,  of  course, 
the  current  L3  block  contains  no  buffer  elements). 
Therefore,  the  buffer  is,  in  general,  broken  into 
two  pieces,  each  to  the  left  of  the  growing  edge  of 
a  block.  Should  an  L3  block  be  filled  before  the 
current  LA  block  is  completed,  then  we  simply 
begin  a  new  L3  block  to  the  immediate  left  of  the 
newly-filled  L3  block.  When  the  current  LA  block 
is  filled,  in  which  case  the  buffer  now  occupies  5 
contiguous  record  locations,  we  begin  a  new  LA 
block  to  the  immediate  left  of  the  current  L3 
block.  At  this  point,  we  reverse  the  new-block 
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rule.  (That  is,  we  begin  a  new  L4  block  to  the 
immediate  left  of  a  newly  filled  L4  block  until  the 
current  L3  block  is  filled,  at  which  time  the  buffer 
again  occupies  s  contiguous  record  locations  and 
we  begin  a  new  L3  block  to  the  immediate  left  of 
the  current  L4  block.)  We  continue  this  process  of 
building  blocks  and  reversing  the  new-block  rule 
as  necessary  until  all  of  AC  has  been  examined. 

If  e3  =  0 ,  then  the  buffer  now  occupies  the 
leftmost  nonempty  block,  which  has  size  5.  If 
e3  >  0,  then  we  use  BLOCKSWAP  to  exchange 
the  e3  buffer  records  in  the  leftmost  nonempty 
block  with  the  rightmost  e3  records  in  the  current 
L3  block,  thereby  consolidating  the  buffer  into 
one  5-sized  block.  At  this  time,  the  leftmost  block 
of  L3  (of  size  e3)  is  finished. 

Again,  B  is  the  only  block  that  is  internally 
disordered,  insuring  stability.  O (n)  time  and  0(1) 
space  are  sufficient  for  this  step  as  well. 

Step  3.  {BLOCK  REARRANGEMENT) 

L3  is  now  made  up  of  a  sequence  of  blocks 
interspersed  with  another  sequence  constituting 
L4.  Except  for  the  buffer,  each  sequence  of  blocks 
as  well  as  the  elements  within  each  block  are  in 
the  proper  order.  We  now  use  the  buffer  to  sep¬ 
arate  the  two  sequences. 

We  invoke  SORT  on  the  buffer  and,  since  our 
choice  of  5  ensures  that  t3  +  t3  <  5,  use  the  buffer 
to  “remember”  the  two  interspersed  sequences. 
That  is,  we  use  a  scan  of  L2  to  classify  each  block 
as  type  L3  or  L4,  exchanging  the  t3  smallest  buffer 
elements  with  the  respective  heads  of  the  t3  5-sized 
L3  blocks,  and  exchanging  the  t4  -  1  largest  buffer 
elements  with  the  appropriate  heads  of  the  t4  —  1 
5-sized  L4  blocks  other  than  B.  We  next  use 
BLOCKSWAP  to  move  the  buffer  to  the  position 
of  the  rightmost  5-sized  block.  Then  we  invoke 
BLOCKSORT  on  the  first  t3  +  t4  -  1  5-sized 
blocks  and  restore  the  contents  of  the  buffer. 
Finally,  if  e4  >  0,  we  use  BLOCKSWAP  to  ex¬ 
change  the  e4  L4  records  to  the  immediate  right  of 
the  buffer  with  the  leftmost  e4  records  in  the 
buffer.  Thus  we  have  produced  L3{L4-  B)BL2, 
and  now  go  to  Step  5. 

Since  both  SORT  and  BLOCKSORT  operate 
only  on  distinct  keys,  our  actions  at  this  step  are 
stable.  Also,  O(n)  time  and  0(1)  space  suffice, 
since  each  sort  involves  0(/n )  keys. 


Step  4.  {SPECIAL  CASE) 

Suppose  there  are  only  s'  <  s  distinct-keyed 
records  to  go  to  L4.  Thus  Step  1  has  replaced  L1 
with  AB ,  where  B  is  smaller  than  desired.  (Without 
loss  of  generality,  we  assume  that  s'  is  nonzero.) 
Since  bringing  the  buffer’s  size  up  to  s  with 
duplicate  keys  would  eliminate  stability,  we  adopt 
the  following  strategy  that  uses  larger  blocks  of 
size  b  =  \n1/s'].  Thus,  nl  —  s'  =  bt~  e  for  unique 
nonnegative  integers  t  and  e,  where  e  <  b.  We 
now  seek  to  transform  A  into  a  list  of  t  blocks, 
the  first  a  block  of  size  b  -he,  followed  by  t  - 1 
blocks  of  size  b. 

Viewing  A  in  this  form,  we  compare  its  right¬ 
most  two  blocks.  If  the  number  of  L4  records  is  as 
large  as  the  number  of  L3  records  in  these  two 
blocks,  then  we  ROTATE  each  consecutive  seg¬ 
ment  of  L4  records  with  L3  records  to  their  right 
as  necessary  so  that  the  rightmost  block  is  of  type 
L4.  If  there  are  more  L3  records  present,  we  do 
likewise  to  make  the  rightmost  block  of  type  L3. 
Having  finished  the  rightmost  block,  we  next  con¬ 
sider  the  pair  of  blocks  to  its  immediate  left,  and 
so  on.  We  repeat  this  process  on  adjacent  pairs  of 
blocks  until  there  is  only  one  block  left,  of  size 
b  +  e,  whose  elements  we  rotate  as  necessary  to 
form  an  L3  block  D  followed  by  an  L4  block  E. 
Thus  A  has  taken  on  the  form  DEF,  where  F 
contains  t  -  1  2>-sized  blocks,  each  of  type  L3  or 
L4 .  Now,  in  a  fashion  analogous  to  that  of  the 
general  case  described  in  Step  3,  we  SORT  the 
buffer  and  use  it  to  “remember”  the  two  se¬ 
quences  interspersed  in  F  (note  that  this  is  possi¬ 
ble  since  our  choice  of  b  ensures  that  5'>0- 
Then  we  invoke  BLOCKSORT  to  transform  F 
into  GH  where  G  {H)  contains  only  L3  {L4) 
blocks,  and  restore  the  contents  of  the  buffer. 
Finally,  we  ROTATE  the  sublist  EG  to  obtain 
)sjl4  -  B)BL2, 

As  with  Steps  2  and  3  of  the  general  algorithm, 
our  actions  at  Step  4  are  stable.  Since  there  are 
only  s'  distinct  keys  to  go  to  L4,  blockifying  the 
list  needs  only  0(5')  rotations,  each  of  length 
0{b\  guaranteeing  the  0(«)  time  and  0(1)  space 
bounds.  As  in  Step  3  of  the  general  algorithm, 
sorting  blocks  and  sorting  the  buffer  involve  at 
most  O(v^)  keys  and  need  only  O(w)  time  and 
0(1)  space. 


134 


Volume  39,  Number  3 

Step  5.  ( FINISH  UP) 

From  the  left-to-right  scan  that  produced  B  in 
Step  1,  we  know  B  contains  the  leftmost  record 
from  Lx  containing  each  distinct  key  that  is  repre¬ 
sented  in  B ,  so  that  we  can  achieve  stability. 
Therefore,  we  now  simply  scan  both  L4  —  B  and 
B  from  right  to  left,  merging  then  by  employing 
ROTATE  on  the  rightmost  unmerged  sequences 
of  L4-  B  and  the  appropriate  leftmost  unmerged 
sequences  of  B  to  produce  L3L4L2,  the  desired 
result.  O (n)  time  and  0(1)  space  suffice  for  this 
final  step  as  well,  since  records  from  L4  —  B  are 
rotated  at  most  once  while  those  from  B,  of  which 
there  are  only  O (Jn),  are  moved  at  most  O (Jn) 
times.  □ 


4.  Set  operations 

In  what  follows,  suppose  we  are  given  the  input 
list  L  —  XY ,  where  X  and  Y  are  two  sublists,  each 
sorted  on  the  key,  and  each  containing  no  dupli¬ 
cates.  Since  the  same  key  may  naturally  appear 
once  in  X  and  once  in  7 ,  we  insist  that,  in  the 
spirit  of  stability,  the  record  represented  in  the 
result  of  a  binary  set  operation  be  the  one  that 
occurs  first  in  L.  As  with  the  problem  of  selection 
addressed  in  the  last  section,  the  elementary  bi¬ 
nary  set  operations  union,  intersection,  difference 
and  exclusive  OR  are  each  at  least  as  difficult  as 
verification,  and  therefore  require  £2(«)  time  and, 
of  course,  £2(1)  space. 

Theorem  2.  Set  union ,  intersection ,  difference  and 
exclusive  OR  can  be  stably  performed  simulta¬ 
neously  in  both  linear  time  and  constant  extra  space . 

Proof.  The  tools  now  at  hand  make  the  proof  easy. 
We  first  identify  these  fundamental  tools  (each  of 
which  is  stable  and  requires  only  O (n)  time  and 
0(1)  extra  space).  From  [8],  we  obtain  MERGE. 
From  [2],  we  obtain  DUPLICATE-KEY  EX¬ 
TRACT.  From  the  work  in  the  previous  section, 
we  obtain  SELECT. 

We  invoke  MERGE  followed  by  DUPLI¬ 
CATE-KEY  EXTRACT  to  produce  XU  Y.  We 
perform  SELECT  to  yield  both  X  Pi  Y  and  X  —  Y. 
To  achieve  X  ©  7,  we  invoke  SELECT  on  XY 
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producing  XlX2Y,  ROTATE  X2  and  Y  yielding 
XxYX2,  perform  SELECT  on  YX2  producing 
X1YlY2X2,  and  finally  MERGE  Xx  and  Yv  □ 

5.  Multiset  operations 

For  file  processing  applications,  a  list  or  sublist 
may,  of  course,  contain  a  number  of  records  with 
the  same  key.  How  then  are  multiset  operations  to 
be  defined?  Curiously,  there  is  no  definitive  set 
theory  answer,  for  example,  as  to  whether 
(1,  2,  2}  n  {2,  2,  3}  is  {2},  {2,  2}  or  {2,  2,  2,  2}. 
While  the  issue  is  seldom  even  addressed  in  the 
literature,  at  least  one  source  [6]  has  mentioned 
the  second  interpretation.  However,  we  suggest 
that  the  third  interpretation  may  be  particularly 
reasonable  for  file  operations,  since  a  record  typi¬ 
cally  contains  more  information  than  the  key 
alone. 

Suppose  we  are  given  the  input  list  L  =  XY, 
where  X  and  Y  are  two  sublists,  each  sorted  on 
the  key.  We  define  multiset  operations  as  follows: 

union  X  U  Y  =  the  stably  sorted  list  containing  all 
records  in  X  and  all  records  in  Y, 
intersection  X  n  Y  =  the  stably  sorted  list  contain¬ 
ing  all  records  whose  keys  are  in  both  X  and  7, 
difference  X-  Y  =  the  stably  sorted  list  containing 
all  records  whose  keys  are  in  X  but  not  7, 
exclusive  OR  X  ©  7  =  the  stably  sorted  list  con¬ 
taining  all  records  whose  keys  are  in  X  or  7, 
but  not  both. 

We  observe  that,  with  multiset  operations  as 
defined  above,  De  Morgan’s  law  holds.  Also,  each 
operation  requires  £2(w)  time  and  £2(1)  extra  space. 

Theorem  3.  Multiset  union,  intersection ,  difference 
and  exclusive  OR  can  be  stably  performed  simulta¬ 
neously  in  both  linear  time  and  constant  extra  space . 

Proof.  Again,  the  tools  now  at  hand  make  the 
proof  easy.  We  use  MERGE  to  produce  XUY. 
We  invoke  SELECT  to  yield  X—  7.  For  the 
remaining  two  operations,  we  first  perform  the 
SELECT-ROTATE-SELECT  series  of  primi¬ 
tives  described  in  the  proof  of  Theorem  2  to 
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produce  XlYlY2X2.  We  MERGE  Xx  and  Yx  to 
obtain  X®  Y.  To  achieve  XDY,  we  ROTATE 
Y2X 2  to  obtain  X2Y29  then  MERGE  X2  and  Y2. 

□ 

While  our  interpretations  have  been  chosen  for 
consistency  in  file  processing  applications,  we  ob¬ 
serve  that  the  interpretations  of  [6]  equate  union 
with  the  maximum  number  of  occurrences,  inter¬ 
section  with  the  minimum  number  of  occurrences, 
and  so  on.  Thus  De  Morgan’s  law  holds  for  these 
interpretations  as  well,  with  the  binary  set  oper¬ 
ations  reducing  to  special  cases  of  these  multiset 
definitions.  (Stability,  however,  now  seems  to  have 
no  obvious  meaning.)  By  modifying  slightly  the 
way  in  which  our  SELECT  algorithm  scans  L2  to 
determine  which  records  go  to  L4,  we  note  that 
each  multiset  operation  of  [6]  can  also  be  per¬ 
formed  simultaneously  in  both  linear  time  and 
constant  extra  space. 

6.  Remarks 

Given  today’s  low  cost  of  computer  memory 
components,  it  may  be  that  these  methods  can  be 
used  to  best  advantage  when  working  with  large 
external  files.  In  such  an  environment,  in-place 
operations  can  be  viewed  as  a  means  for  increas¬ 
ing  the  effective  size  of  internal  memory,  which 
can  result  in  reduced  I/O  time  (through  fewer 
transfers,  more  and  larger  buffers,  etc.),  and  a 
corresponding  drastic  reduction  in  the  overall 
elapsed  time  for  file  processing.  Furthermore,  we 
think  it  is  likely  that  each  time-space  optimal 
operation  can  be  performed  directly  (as  with  the 
proof  of  Theorem  1)  and  in  parallel  (as  with  the 
merge  and  sort  routines  recently  devised  in  [1]). 

To  gauge  the  practical  potential  of  these  time- 
space  optimal  schemes,  we  briefly  focus  on  key 
comparisons  and  record  exchanges.  These  two 
fundamental  operations  are  usually  regarded  as  by 
far  the  most  time  consuming  for  internal  file 
processing.  Both  require  storage-to-storage  in¬ 
structions  for  most  architectures.  Moreover,  it  is 
possible  to  count  them  independently  from  the 
code  of  any  particular  implementation.  Therefore, 


Table  1 

Constant  of  proportionality  upper  bounds 


Operation 

Set 

Multiset 

Union 

13 

7 

Intersection 

11.5 

32 

Difference 

11.5 

11.5 

Exclusive  OR 

31 

31 

their  total  gives  a  useful  estimate  of  the  size  of  the 
linear-time  constant  of  proportionality  for  each 
algorithm  presented. 

These  worst-case  totals  are  shown  in  Table  1. 
The  computation  of  each  value  2  displayed  is 
straightforward,  and  is  eliminated  from  our  pre¬ 
sentation  for  the  sake  of  brevity.  (For  each 
MERGE  operation,  we  use  the  constant  from  [3], 
a  marked  improvement  over  that  of  [8].) 

As  for  the  issue  of  constant  extra  space,  none 
of  our  methods  explicitly  requires  more  than  a 
couple  of  dozen  additional  storage  cells  for  use  as 
pointers,  counters  and  the  like. 
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2  Because  our  procedures  were  devised  with  an  eye  toward 
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— computations  presuppose  some  sort  of  (possibly  unrealiz¬ 
able)  worst-case  scenario,  the  figures  we  show  are  rather 
conservative  upper  bounds. 
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Abstract 

Gate  matrix  layout  is  a  well-known  .yF^-complete  problem  that  arises  at  the  heart  of 
a  number  of  VLSI  layout  styles.  Despite  its  apparent  general  intractability,  it  has  recently  been 
shown  that  it  can  be  solved  in  0(n2)  time  whenever  the  number  of  tracks  is  fixed.  Curiously,  the 
proof  of  this  is  nonconstructive,  based  on  finite  but  unknown  obstruction  sets.  What  then  are 
such  sets,  and  what  is  their  underlying  structure?  The  main  result  we  report  in  this  paper  is 
a  proof  that  the  obstruction  set  for  three  tracks  contains  exactly  1 10  elements.  We  also  describe 
a  number  of  methods  for  obstruction  identification  that  extend  to  any  number  of  tracks. 

Key  words:  Circuit  layout;  Finite-basis  characterizations;  Polynomial-time  complexity 


1.  Introduction 

Traditionally,  decision  problems3  have  been  classified  as  either  “easy”  or  “hard”, 
dependent  on  whether  low-degree  polynomial-time  decision  algorithms  exist  to  solve 
them.  Until  recently,  one  could  expect  any  proof  of  easiness  to  be  constructive.  That  is, 
the  proof  itself  should  provide  “positive  evidence”  in  the  form  of  the  promised 
polynomial-time  decision  algorithm.  Surprising  advances,  however,  dramatically  alter 
this  appealing  picture.  See,  for  example,  [6-8]  for  applications  of  tools  from  [14-17] 
that  nonconstructively  establish  the  existence  of  low-degree  polynomial-time  decision 
algorithms  for  a  number  of  challenging  combinatorial  problems. 
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suffices  to  transform  decision  algorithms  into  search  or  optimization  algorithms  [3,  9,  10]. 
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In  general,  problems  amenable  to  this  approach  are  modeled  as  graphs.  The 
algorithm  can  decide  whether  a  given  encoding  of  a  problem  is  a  “yes”  instance 
or  a  “no”  instance  by  determining  if  it  contains  an  element  of  a  finite  basis  of  for¬ 
bidden  graphs  (the  obstruction  set).  Strikingly,  the  underlying  theory  does  not 
tell  how  to  identify  all  members  of  such  a  set,  the  cardinality  of  the  set,  or  even 
the  order  of  the  largest  member  of  the  set.  The  only  fact  we  are  given  is  that  the  set 
is  finite. 

Perhaps  the  best-known  example  of  an  algorithm  based  on  such  “negative  evid¬ 
ence”  is  the  celebrated  finite-basis  characterization  of  planar  graphs  [13]:  a  graph  is 
planar  if  and  only  if  it  contains  no  member  of  a  two-element  obstruction  set  in  the 
topological  order.  The  main  result  we  present  in  this  paper  is  a  similar  finite-basis 
characterization  for  the  three-track  gate  matrix  layout  problem:  a  graph  represents 
a  circuit  with  a  three-track  layout  if  and  only  if  it  contains  no  member  of  a  110-ele¬ 
ment  obstruction  set  in  the  minor  order. 

Interestingly,  it  has  recently  been  recognized  [10]  that  gate  matrix  layout  with 
parameter  k  is  identical  to  the  path- width  problem  with  parameter  k-  1.  (That  is, 
a  graph  G  represents  a  circuit  with  a  /c-track  layout  if  and  only  if  G  has  a  path 
decomposition  [14]  of  width  at  most  k  -  1.)  Because  the  work  we  report  here  was 
originally  derived  in  terms  of  circuit  layout,  and  because  gate  matrix  layout  has 
received  considerable  attention  in  the  literature,  we  shall  neither  state  nor  prove  our 
results  in  terms  of  path-width.  Instead,  we  only  note  that  it  is  fortuitous  that  our 
efforts  contribute  to  the  understanding  of  this  important  width  metric. 

Our  proofs  are  of  two  general  types.  Some  describe  characteristics  of  obstructions, 
and  thereby  help  to  delimit  the  search  space.  Others  show  how  a  number  of  obstruc¬ 
tions  can  be  constructively  obtained.  Since  these  techniques  alone  are  sufficient  to 
bound  but  insufficient  to  isolate  all  obstructions,  many  obstructions  were  identified 
with  the  aid  of  exhaustive  case-checking.  To  assist  in  this  heroic  undertaking,  massive 
computational  power4  was  used  to  verify  that  each  obstruction  represents  a  circuit 
that  has  no  three-track  layout,  and  to  check  that  each  proper  minor  of  each  obstruc¬ 
tion  represents  a  circuit  that  does  have  a  three-track  layout. 

In  the  next  two  sections,  we  discuss  relevant  background  information.  In  Section  4, 
we  present  the  notation  and  terminology  used  throughout  the  remainder  of  this 
paper.  In  Sections  5  and  6,  we  prove  several  general  results  and  constructions  that 
hold  for  any  number  of  tracks.  In  Section  7,  we  determine  some  specific  properties 
required  of  three-track  layouts  and  isolate  all  nonouterplanar  obstructions.  In  Section 
8,  we  enumerate  the  entire  three-track  obstruction  set  and  prove  that  this  set  is 
complete.  In  the  final  two  sections,  we  summarize  our  work  and  pose  a  few  related 
open  problems. 


4We  employed  the  dynamic  programming  formulation  as  given  in  [4]  and  as  streamlined  in  [12].  The 
algorithm’s  consumption  of  both  time  and  space  was  enormous;  its  use  was  generally  restricted  to  instances 
of  moderate  size  (graphs  with  no  more  that  about  twenty  edges)  on  an  IBM  3090-300E. 
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2.  The  minor  order 

A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  minor  order,  written  H  G,  if 
and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series  of  these  two 
operations:  taking  a  subgraph  and  contracting  an  arbitrary  edge.  A  family  F  of  graphs 
is  said  to  be  closed  under  the  minor  order  if  the  facts  that  G  is  in  F  and  that  H  G 
together  imply  that  H  must  be  in  F.  The  obstruction  set  for  a  family  F  of  graphs  is  the 
set  of  graphs  in  the  complement  of  F  that  are  minimal  in  the  minor  order.  Therefore,  if 
F  is  closed  under  the  minor  order,  it  has  the  following  characterization:  G  is  in  F  if  and 
only  if  H  ^mG  for  every  H  in  the  obstruction  set  for  F. 

Theorem  2.1  [17].  Any  set  of  finite  graphs  contains  only  a  finite  number  of  minor- 
minimal  elements. 

Theorem  2.2  [16].  For  every  fixed  graph  H ,  the  problem  that  takes  as  input  a  graph 
G  and  determines  whether  H  G  is  solvable  in  polynomial  time. 

Theorems  2.1  and  2.2  guarantee  only  the  existence  of  a  polynomial-time  decision 
algorithm  for  any  minor-closed  family  of  graphs.  In  particular,  no  proof  of  Theorem 
2.1  can  be  entirely  constructive  [10]. 

Letting  n  denote  the  number  of  vertices  in  G,  the  time  bound  for  algorithms  ensured 
by  these  theorems  is  0(n3).  If  F  excludes  a  planar  graph,  then  the  bound  reduces  to 
0(rc2).  In  general,  these  algorithms  possess  enormous  constants  of  proportionality 
[15],  although  new  techniques  greatly  mitigate  them  [18],  and  methods  specific  to 
layout  problems  such  as  the  one  we  address  here  lower  them  even  more  [10]. 


3.  The  gate  matrix  layout  problem 

Gate  matrix  layout  is  a  combinatorial  problem  that  arises  in  several  VLSI  layout 
styles,  including  gate  matrix,  PLAs  under  multiple  folding,  Weinberger  arrays  and 
others.  It  was  originally  posed  in  terms  of  operations  on  Boolean  matrices.  Formally, 
we  are  given  an  n  x  m  Boolean  matrix  M  and  an  integer  k,  and  are  asked  whether  we 
can  permute  the  columns  of  M  so  that,  if  in  each  row  we  change  to  *  every  0  lying 
between  the  row’s  leftmost  and  rightmost  1,  then  no  column  contains  more  than  k  Is 
and  *s.  Such  a  *  is  termed  a  fill-in.  We  refer  the  interested  reader  to  [4]  for  sample 
instances,  figures  and  additional  background  on  this  challenging  problem. 

Although  the  general  problem  is  J^^-complete,  it  has  been  shown  that,  for  any 
fixed  value  of  k,  an  arbitrary  instance  can  be  mapped  to  an  equivalent  instance  with 
only  two  Is  per  column,  then  modeled  as  a  graph  on  n  vertices  such  that  the  family  of 
“yes”  instances  is  closed  under  the  minor  order  and  excludes  a  planar  graph. 

Theorem  3.1  [6].  For  any  fixed  k,  gate  matrix  layout  can  be  decided  in  O  (n2)  time. 
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Fig.  1.  Obstruction  set  for  2-GML. 


Thanks  to  this  mapping  defined  on  arbitrary  Boolean  matrices,  it  suffices  to  restrict 
our  attention  to  connected,  simple  graphs. 

In  the  sequel,  we  shall  use  the  term  Ic-GML  to  denote  the  k-track  variant  of  gate 
matrix  layout.  Thus,  an  obstruction  for  k-GML  is  a  graph  that  represents  a  “no” 
instance  for  parameter  k  (it  has  no  k-track  layout)  and  that  is  minimal  for  parameter 
k  (each  of  its  proper  minors  does  have  a  k- track  layout).  For  1-GML,  it  is  trivial  to  see 
that  the  obstruction  set  contains  only  K2 .  For  2-GML,  the  only  obstructions  are 
K3  and  S(Klt3)5  [2]  (see  Fig.  1).  (The  connected  graphs  that  are  “yes”  instances  for 
2-GML  are  known  as  caterpillars.) 


4.  Definitions  and  notation 

Let  G  denote  a  graph,  with  vertex  set  V  and  edge  set  £,  and  let  M  denote  an 
incidence  matrix  for  G,  augmented  as  necessary  with  fill-ins.  For  convenience,  we 
assume  a  labeling  for  V  and  some  appropriate  bijection  between  these  labels  and  the 
rows  of  M.  Thus  we  shall,  for  example,  refer  merely  to  “row  w”  rather  than  to  the  more 
precise  but  cumbersome  “row  that  corresponds  to  vertex  m”. 

We  term  the  matrix  M  a  permutation  for  G,  since  the  ordering  of  its  columns 
determines  an  ordering  for  E.  The  cost  of  a  column  is  the  total  number  of  Is  and  fill-ins 
it  contains.  The  cost  of  a  permutation  is  the  maximum  cost  of  any  of  its  columns.  The 
cost  of  a  graph  is  the  minimum  cost  of  any  of  its  permutations.  These  costs  represent 
the  number  of  tracks  required  in  a  layout  of  the  associated  circuit. 

A  vertex  of  degree  one  is  a  pendant  vertex.  A  (simple)  path  is  a  sequence  of  distinct 
vertices  vuv2i...9vh  such  that  edge  v^+i  eE  for  1  <  i  <  h.  Vertices  that  form  such 
a  sequence  are  consecutive.  A  pendant  path  is  a  path  in  which  v±  has  degree  three  or 
more,  vh  has  degree  one,  and  each  vi9  1  <  i  <  h,  has  degree  two.  The  length  of  such 
a  path  is  h  —  1,  the  number  of  edges  it  contains. 

A  planar  graph  along  with  a  planar  embedding  is  called  a  plane  graph.  Similarly,  an 
outerplanar  graph  [11]  along  with  an  outerplanar  embedding  is  called  an  outer  plane 
graph.  The  regions  of  the  plane  bounded  by  the  embedding  are  called  faces.  (The 
unbounded  region  is  known  as  the  “exterior”  face.  Unless  otherwise  noted,  a  face  is 
understood  to  mean  an  interior  face.)  Two  faces  in  a  plane  graph  are  edge  adjacent  if 


5S(£1j3)  is  the  graph  obtained  by  subdividing  each  edge  of  K1>3. 
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their  intersection  contains  one  or  more  edges.  Two  faces  are  vertex  adjacent  if  their 
intersection  contains  one  or  more  vertices  but  no  edges. 

Given  a  permutation  for  a  graph,  the  span  for  a  vertex  is  the  collection  of  columns 
that  contain  either  a  1  or  a  fill-in  in  its  row.  If  the  graph  is  plane,  then  the  span  for 
a  face  is  the  collection  of  columns  that  lie  between  the  leftmost  and  rightmost  columns 
that  represent  edges  of  the  face,  inclusive. 

Finally,  we  assume  the  reader  is  familiar  with  standard  graph  operations,  in 
particular  subtraction  (\),  union  (u)  and  intersection  (n)  [1]. 


5.  Obstruction  characterization  tools 

In  this  section  and  the  next,  we  shall  derive6  a  number  of  results  that  help  to 
characterize  or  construct  obstructions.  These  results  hold  for  arbitrary  k. 

Lemma  5.1.  No  obstruction  for  k-G ML  contains  two  or  more  pendant  paths  of  length 
one  incident  on  a  common  vertex. 

Proof.  Assume  otherwise,  and  let  G  denote  an  obstruction  for  /c-GML  with  pendant 
vertices  v  and  w,  both  adjacent  to  vertex  u.  Let  G'  =  G\{w}.  Since  G  is  minimal  for 
parameter  k ,  G'  possesses  a  permutation  M'  with  cost  at  most  k.  (Recall  that 
permutations  are  augmented  only  as  necessary  with  fill-ins,  and  so  AT  has  no  fill-ins 
whatsoever  in  row  v.)  Consider  the  matrix  M  obtained  from  M'  by  adding  row  w  and 
placing  column  uw  adjacent  to  column  uv.  The  cost  of  column  uw  is  identical  to  that  of 
column  uv,  because  still  no  fill-ins  are  needed  in  row  v.  Moreover,  the  costs  of  all  other 
columns  remain  unchanged,  because  no  fill-ins  are  required  anywhere  in  row  w. 
Therefore,  M  is  a  permutation  for  G  with  cost  at  most  k ,  contradicting  our  assumption 
that  G  has  no  fc-track  layout.  □ 

Lemma  5.2.  No  obstruction  for  k-G  ML  contains  a  pendant  path  of  length  greater  than 
two. 

Proof.  Assume  otherwise,  and  let  G  denote  an  obstruction  for  /c-GML  with  pendant 
path  x, v,  w  of  length  three  or  more.  Let  G '  =  G\{w}.  Because  G  is  minimal  and 
because  G'  <m  G,  there  is  an  optimal  permutation  M'  for  G’  with  cost  at  most  k.  Since 
u  has  degree  two,  we  may  assume  by  symmetry  that  column  uv  contains  the  rightmost 
1  in  row  u.  Consider  the  matrix  M  obtained  from  M'  by  adding  row  w  and  inserting 
column  vw  to  the  immediate  right  of  column  uv.  Since  column  vw  does  not  need 


6As  a  matter  of  style,  we  shall  omit  proofs  when  they  follow  immediately  from  previous  results,  and  shall 
merely  point  the  reader  to  an  earlier  proof  when  analogous  arguments  suffice.  We  realize  that  the 
responsibility  for  deciding  when  to  suppress  details  in  this  presentation  is  ours  alone,  and  remark  that  full 
proofs,  even  of  the  corollaries,  can  be  found  in  [12]. 
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a  fill-in  in  row  u,  its  cost  is  the  same  as  that  of  column  uv.  The  costs  of  all  other 
columns  remain  unchanged,  because  no  fill-ins  are  required  in  row  v  or  in  row  w. 
Therefore,  M  is  a  permutation  for  G  with  cost  at  most  k ,  contradicting  the  assumption 
that  G  has  no  /c-track  layout.  □ 

Thus,  a  pendant  vertex  is  an  endpoint  of  either  a  pendant  path  of  length  one  (which 
we  shall  henceforth  call  a  pendant  edge)  or  a  pendant  path  of  length  two  (which  we 
shall  without  ambiguity  henceforth  term  simply  a  pendant  path,  omitting  reference  to 
its  length). 

Lemma  5.3.  If  a  graph  has  a  pendant  path,  then  there  is  an  optimal  permutation  for  that 
graph  in  which  the  edges  of  the  path  are  represented  by  adjacent  columns. 

Proof.  Let  G  denote  a  graph  with  pendant  path  u,  v,  w,  and  let  M  denote  an  optimal 
permutation  for  G.  Suppose  that  columns  uv  and  vw  are  not  adjacent,  and  that  column 
uv  is  to  the  left  of  column  vw.  The  rightmost  1  in  row  u  must  be  either  (1)  in  column  uv, 
(2)  in  a  column  between  columns  uv  and  vw,  or  (3)  in  a  column  to  the  right  of  column 
vw. 

Suppose  (1)  holds.  We  construct  a  new  matrix  M'  from  M  by  moving  column  vw  to 
the  left  until  it  is  to  the  immediate  right  of  column  uv.  Since  column  vw  does  not 
require  a  fill-in  in  row  u,  its  cost  is  no  more  than  that  of  column  uv.  Moreover,  no 
column  now  requires  a  fill-in  in  row  v,  and  the  cost  of  M'  is  no  more  than  that  of  M. 

Suppose  (2)  holds.  For  the  sake  of  discussion,  assume  that  the  rightmost  1  in  in  row 
u  is  in  column  ux.  We  construct  matrix  M'  from  M  by  first  moving  column  uv  to  the 
right  until  it  is  to  the  immediate  right  of  column  ux.  Since  column  ux  had  a  fill-in  in  row 
v,  the  cost  of  column  uv  is  no  more  than  the  original  cost  of  column  ux.  To  complete 
the  construction  of  M',  move  column  vw  to  the  left  until  it  is  to  the  immediate  right  of 
column  uv.  Since  column  vw  does  not  require  a  fill-in  in  row  u,  its  cost  is  no  more  than 
that  of  column  uv.  Therefore,  the  cost  of  M'  is  no  greater  than  that  of  M. 

Suppose  (3)  holds.  We  construct  matrix  M'  from  M  by  moving  column  uv  to  the 
right  until  it  is  to  the  immediate  left  of  column  vw.  Since  column  vw  has  a  fill-in  in  row 
u,  the  cost  of  column  uv  is  no  more  than  the  cost  of  column  vw.  Thus,  the  cost  of  M' 
cannot  exceed  that  of  M.  □ 

Lemma  5.4.  No  obstruction  for  k-G ML  contains  more  than  three  pendant  paths 
incident  on  a  common  vertex. 

Proof.  Assume  otherwise,  and  let  G  denote  an  obstruction  for  fc-GML  with  four  or 
more  pendant  paths  incident  on  vertex  u.  Let  u,  v,  w  be  one  such  pendant  path,  and  let 
G'  —  G\{uv,  tnv}.  Because  G  is  minimal  and  because  G'  G,G'  possesses  a  permuta¬ 
tion  Af'  with  cost  at  most  k  in  which  (due  to  Lemma  5.3)  each  pendant  path  incident 
on  u  is  represented  by  a  pair  of  adjacent  columns.  Let  the  second  such  pair  of  columns 
represent  pendant  path  u,x,y.  (We  choose  the  second  pair  of  columns  since  this 
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guarantees  that  column  xy  has  a  fill-in  in  row  u .)  We  construct  matrix  M  from  M'  by 
adding  rows  v  and  w,  and  by  placing  columns  uv  and  vw  to  the  immediate  right  of 
columns  ux  and  xy.  Since  no  fill-ins  are  required  in  rows  x  and  y,  the  costs  of  columns 
uv  and  vw  are  the  same  as  the  costs  of  columns  ux  and  xy,  respectively.  Therefore,  M  is 
a  permutation  for  G  with  cost  at  most  k,  contradicting  our  assumption  that  G  has  no 
/c-track  layout.  □ 

Lemma  5.5.  For  k  >  2,  no  obstruction  for  /c-GML  contains  more  than  two  consecutive 
vertices  of  degree  two. 

Proof.  Assume  otherwise,  and  let  G  denote  an  obstruction  for  /c-GML,  k  >  2,  with 
consecutive  vertices  u ,  v  and  w,  each  of  degree  two.  Let  G'  be  the  graph  obtained  from 
G  by  contracting  the  edge  uv  to  u.  (Observe  that  u  and  w  each  retain  degree  two  in  G': 
no  increase  in  degree  is  possible;  a  decrease  would  imply  either  that  G  is  K3  and  hence 
not  an  obstruction  for  k  >  2,  or  that  G  is  not  connected  and  hence  not  an  obstruction 
for  any  k.)  Because  G  is  minimal  and  because  G'  G,  G'  must  possess  an  optimal 
permutation  JVT  with  cost  at  most  k.  From  the  facts  that  M'  has  no  unnecessary  fill-ins 
in  rows  u  and  w  and  that  both  u  and  w  have  degree  two,  it  follows  that  either  (1)  the 
spans  for  these  two  rows  overlap  only  in  column  uw  or  (2)  the  span  for  one  properly 
contains  the  span  for  the  other. 

Suppose  (1)  holds.  For  the  sake  of  discussion,  assume  the  single  column  of  overlap 
(column  uw)  contains  the  rightmost  1  in  row  u  and  thus  the  leftmost  1  in  row  w.  We 
construct  at  no  extra  cost  a  new  matrix  M  from  M',  by  replacing  column  uw  with 
columns  uv  and  vw. 

Suppose  (2)  holds.  For  the  sake  of  discussion,  assume  the  span  for  u  properly 
contains  the  span  for  w,  with  column  uw  the  rightmost  in  both  spans.  Let  c  denote  the 
column  that  contains  the  leftmost  1  in  row  w.  We  construct  at  no  extra  cost  a  new 
matrix  M  from  M\  by  replacing  column  uw  with  column  vw,  and  by  inserting  column 
uv  to  the  immediate  left  of  column  c. 

In  either  case,  M  is  a  permutation  for  G  with  cost  at  most  k ,  contradicting  the 
assumption  that  G  has  no  /c-track  layout.  □ 

Lemma  5.6.  Suppose  G  contains  a  pair  of  adjacent  vertices ,  u  and  v,  each  of  degree  two. 
If  G'  is  obtained  from  G  by  contracting  the  edge  uv  to  u,  adding  a  new  vertex  w,  and 
adding  the  edge  uw,  then  the  cost  of  G  equals  that  of  G'. 

Proof.  Let  G,  G',u,v  and  w  be  as  defined  in  the  statement  of  the  lemma.  Let  t  (x)  denote 
the  other  vertex  adjacent  to  u  (u)  in  G.  Let  M  denote  an  optimal  permutation  for  G. 

We  construct  a  new  matrix  M’  from  M  by  replacing  column  vx  with  column  ux  and 
changing  the  label  on  row  v  to  w.  Row  w  contains  a  single  1  and  requires  no  fill-ins. 
Any  column  that  now  needs  a  new  fill-in  in  row  u  originally  had  a  fill-in  in  row  v.  Thus, 
the  cost  of  Af'  is  no  more  than  that  of  M.  Because  M'  is  a  permutation  for  G',  the  cost 
of  G'  cannot  exceed  that  of  G. 
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Let  M'  denote  an  optimal  permutation  for  G'.  Note  that,  in  G',  vertex  w  has  degree 
three  and  is  adjacent  to  vertices  t ,  w,  and  x.  It  suffices  to  consider  two  cases  for  M\  in 
that  either  (1)  column  uw  lies  between  columns  tu  and  wx  or  (2)  column  uw  contains  the 
leftmost  1  in  row  w. 

Suppose  (1)  holds.  We  construct  a  new  matrix  M  from  M'  by  replacing  column  ux 
with  column  wx  and  changing  the  label  on  row  w  to  v.  Any  fill-ins  required  in  row  v  lie 
in  columns  that  no  longer  require  fill-ins  in  row  w.  Thus,  the  cost  of  M  is  no  more  than 
that  of  M\ 

Suppose  (2)  holds.  If  column  uw  has  a  fill-in  in  row  t  (or  row  x)  then,  at  no  extra  cost, 
we  move  column  tu  (column  wx)  to  the  immediate  left  of  column  uw.  M  can  now  be 
constructed  as  in  (1).  If  column  uw  has  Os  in  both  row  t  and  row  x,  then  we  may  assume 
that  columns  tu  and  ux  contain  the  leftmost  Is  in  rows  t  and  x,  respectively.  (To  see 
this,  note  that  if  another  column  c  holds  the  leftmost  1  in  row  t  (row  x),  then  c  has 
a  fill-in  in  row  w  and  column  tu  (column  wx)  can  be  placed  to  the  left  of  c  with  no 
increase  in  cost).  If  column  wx  is  to  the  right  of  column  tu  then,  at  no  extra  cost,  we 
move  column  uw  to  the  immediate  left  of  column  wx.  Otherwise,  at  no  extra  cost, 
we  move  column  uw  to  the  immediate  left  of  column  tu.  M  can  now  be  constructed 
as  in  (1). 

Because  M  is  a  permutation  for  G,  the  cost  of  G  cannot  exceed  that  of  G'.  □ 

Corollary  5.7.  If  G  and  G'  denote  graphs  as  defined  in  Lemma  5.6,  then  G  is  an 
obstruction  for  k-G ML  if  and  only  if  G'  is. 

Corollary  5.8.  No  obstruction  for  /c-GML  contains  two  adjacent  vertices  of  degree  three 
each  adjacent  to  a  pendant  vertex  as  well 

Corollary  5.9.  No  obstruction  for  k-G  ML  contains  a  vertex  of  degree  three  adjacent  to 
both  a  pendant  vertex  and  a  vertex  of  degree  two. 

Lemma  5.10.  Let  G  denote  an  arbitrary  graph  with  cost  /c,  and  let  v  denote  any  vertex  of 
G.  G  possesses  an  optimal  permutation  in  which  every  column  with  cost  k  has  a  nonzero 
entry  in  row  v  if  and  only  if  G\{t>}  does  not  contain  an  obstruction  for  ( k  —  1)-GML. 

Proof.  Let  G  denote  a  graph  with  cost  k  and  let  v  denote  any  vertex  of  G.  If  G\{i;} 
contains  an  obstruction  for  ( k  -  1)-GML,  then  every  optimal  permutation  for  G\{u) 
has  cost  k.  Therefore,  every  optimal  permutation  for  G  contains  a  column  with  cost 
k  that  has  a  0  in  row  v. 

If  G\{v}  does  not  contain  an  obstruction  for  ( k  —  1)-GML,  then  G\{v}  possesses  an 
optimal  permutation  M'  with  cost  at  most  k  -  1.  Consider  the  matrix  M  obtained 
from  Af#  by  adding  row  v  and,  for  each  vertex  w  adjacent  to  v  in  G,  inserting  column  vw 
adjacent  to  any  column  with  a  1  in  row  w.  In  every  case,  the  cost  of  column  vw  is  at  most 
k.  Since  v  is  the  only  row  that  may  need  additional  fill-ins,  M  is  an  optimal  permutation 
for  G  of  cost  k  in  which  every  column  with  cost  k  has  a  nonzero  entry  in  row  v.  □ 
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Corollary  5*11.  Let  G  denote  an  obstruction  for  Ic-GML  and  let  v  denote  a  vertex  of  G. 
G  has  cost  exactly  k  +  1,  and  possesses  an  optimal  permutation  in  which  every  column  of 
cost  k  +  1  has  a  1  or  a  fill-in  in  row  v. 

Lemma  5.12.  Adding  an  edge  to  a  graph  increases  its  cost  by  at  most  one. 

Proof.  Straightforward.  □ 

Lemma  5.13.  If  G  contains  Kk  as  a  subgraph ,  then  G  has  cost  at  least  k  and  possesses 
an  optimal  permutation  in  which  the  edges  of  Kk  are  represented  in  adjacent 
columns. 

Proof.  Follows  immediately  from  Lemma  4.1  of  [6].  □ 

Corollary  5.14.  If  G  contains  Kk  as  a  minor,  then  G  has  cost  at  least  k. 

Corollary  5.15.  Kk  is  an  obstruction  for  (k  —  1)-GML. 

Lemma  5.16.  Let  G  denote  an  arbitrary  graph  and  let  Hi  and  H2  denote  obstructions 
for  k- GML.  If  Gn  Hi  —  G  nH2  =  {v}for  some  vertex  v,  then  the  cost  ofGyjHi  equals 
that  of  Gu  H2. 

Proof.  Let  G,  Hi ,  H2  and  v  be  as  defined  in  the  statement  of  the  lemma.  Let  M  denote 
an  optimal  permutation  for  Gu Hx. 

Since  Ht  is  an  obstruction  for  /c-GML,  some  column  c  of  H x  in  M  has  cost  k  +  1  in 
the  rows  of  Hi .  Either  c  is  contained  in  the  span  for  v  (in  which  case  c  contains  a  1  or 
a  fill-in  in  row  v),  or  else  the  connectedness  of  H t  ensures  that  every  column  of  G  lying 
between  c  and  the  span  for  v  has  a  fill-in  in  some  other  row  of  Hx . 

Due  to  Corollary  5.11,  H2  possesses  a  cost  k  +  1  permutation  M2  in  which  every 
column  with  cost  k  4-  1  has  a  1  or  a  fill-in  in  row  v.  We  use  M2  to  construct  a  new 
matrix  AT  from  M  as  follows.  We  first  eliminate  the  rows  of  Hi  \{t>},  then  all  resultant 
columns  with  at  most  one  1  (one  of  which  is  c).  We  next  insert  M2  into  the  position 
formerly  occupied  by  c  (which  requires  a  new  row  for  each  vertex  of  H2\{v}).  No 
inserted  column  can  require  more  fill-ins  than  did  c.  Due  to  the  way  c  was  chosen  and 
its  relation  to  the  span  for  v,  no  column  originally  in  M  can  incur  an  increase  in  its 
number  of  fill-ins.  Thus,  the  cost  of  M'  is  no  more  than  that  of  M. 

Because  M'  is  a  permutation  for  GuH2,  the  cost  of  G u  H2  cannot  exceed  that  of 
G  u  Hi .  The  inequality  is  established  in  the  reverse  direction  analogously.  □ 

Corollary  5.17.  If  G,H{  and  H2  denote  graphs  as  defined  in  Lemma  5.16,  and  if  Gu  Hx 
is  an  obstruction  for  k'~G ML  but  Gu  H2  is  not,  then  any  obstruction  for  k'-GML 
contained  as  a  minor  in  Gu  H2  has  the  form  Gu  H2  for  some  H2  <m  H2. 

Lemma  5.18.  Let  G  be  a  plane  graph  with  face  F.  In  any  permutation  for  G,  every 
column  in  the  face  span  for  F  has  a  cost  of  at  least  two  in  the  collection  of  rows  that 
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Fig.  2.  Constructions  used  in  Lemmas  6.1,  6.2  and  6.4. 


correspond  to  the  vertices  of  F,  and  every  interior  column  of  that  span  has  a  total  cost  of 
at  least  three. 

Proof.  Straightforward.  □ 


6.  Obstruction  construction  tools 

The  constructions  studied  in  this  section  are  depicted  informally  in  Fig.  2. 

Lemma  6.1.  Let  G1 ,  G2  and  G3  denote  disjoint  (but  not  necessarily  distinct)  obstructions 
for  k-G ML,  let  vt  denote  an  arbitrary  vertex  of  G;  for  1  <  i  <  3,  and  let  v  denote  an 
isolated  vertex  not  in  GiuG2uG3.  The  graph  G  =  Gx  u  G2  u  G3  u  {t>}  u  {vvp. 
1  <  i  <  3}  is  an  obstruction  for  (k  +  1)-GML. 

Proof.  Let  Gf,  vh  v  and  G  be  as  defined  in  the  statement  of  the  lemma.  It  follows  from 
Lemma  4.3  of  [6]  that  G  has  cost  k  +  2. 

We  now  establish  the  minimality  of  G.  Due  to  Corollary  5.11,  each  Giy  1  <  i  <  3 
possesses  a  cost  k  +  1  permutation  M;  in  which  every  column  with  cost  k  +  1  has 
a  1  or  a  fill-in  in  row  vt.  Since  G  is  connected,  the  removal  of  a  vertex  necessarily  means 
the  removal  of  an  edge  and,  therefore,  we  only  need  consider  the  effect  of  removing  or 
contracting  a  single  edge,  e.  Because  of  G’s  symmetry,  we  may  assume  that  either  (1) 
e  is  in  G1?  or  (2)  e  =  vvt. 

Suppose  (1)  holds.  Let  G[  and  G'  denote  the  minors  of  G1  and  G,  respectively,  that 
are  obtained  by  the  removal  or  contraction  of  e  (for  notational  simplicity  in  the  case  of 
a  contraction,  we  insist  that  e  be  contracted  to  vx  if  t>i  ee).  Because  Gi  is  minimal  for 
parameter  k,  G[  possesses  a  permutation  M[  with  cost  at  most  k.  Let  M  denote  the 
permutation  M2,  vv2,  M[ ,  vv3,  M3.  A  column  in  M2  or  M3  has  cost  at  most  k  - 1-1. 
Columns  vv2  and  vv3  each  have  cost  two.  Any  column  in  MJ  incurs  one  additional 
fill-in  (in  row  v\  bringing  its  cost  to  at  most  k  +  1.  Thus,  M  has  cost  k  +  1.  We  form 
M'  from  M  at  no  extra  cost  by  placing  wt  adjacent  to  an  arbitrary  column  in  M  with 
a  1  in  row  vt. 
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Suppose  (2)  holds.  If  G'  =  G\{e},  let  M'  denote  the  permutation  M2,  vv2 ,  vv3,  M3, 
M  Y .  Since  v  now  has  degree  two,  no  fill-ins  are  required  in  its  row.  Because  of  the  way 
M2  and  M3  were  chosen,  any  column  that  requires  a  new  fill-in  in  row  v2  or  row  v3  has 
cost  at  most  k  +  1,  and  so  M'  has  cost  k  +  1.  If  G'  is  obtained  from  G  by  contracting 
e  to  vx ,  let  M'  denote  the  permutation  M2,  vxv2,  Mx ,  vlv3,  M3.  Only  the  vt  rows  may 
require  additional  fill-ins.  Again,  because  of  the  way  each  Mf  was  chosen,  any  new 
fill-in  must  lie  in  a  column  that  has  cost  at  most  k  +  1,  and  so  M'  has  cost  k  +  1. 

In  any  case,  M'  is  a  permutation  for  G',  and  thus  the  cost  of  G'  is  strictly  less  than 
that  of  G.  □ 

Lemma  6.2.  Let  Gi5  G2,  and  G3  denote  disjoint  (but  not  necessarily  distinct)  graphs 
of  cost  k,  and  let  ut  and  denote  arbitrary  ( but  not  necessarily  distinct)  vertices  of 
Gt  for  1  <  i  <  3.  The  graph  G  —  G1kj  G2  u  G3  u  {viv2,u1v3,u2u3}  has  cost  at 
least  k  +  1. 

Proof.  Let  Gh  uh  vt  and  G  be  as  defined  in  the  statement  of  the  lemma.  Let  M  denote 
an  optimal  permutation  for  G  and,  in  M,  let  c{  denote  a  column  of  G*  with  cost  at  least 
k  in  the  rows  of  Gf.  Without  loss  of  generality,  assume  cx  lies  to  the  left  of  c2  which  lies 
to  the  left  of  c3.  If  u^v3  lies  to  the  left  of  c2,  then  c2  has  a  fill-in  in  some  row  of  G3. 
Otherwise,  it  has  a  fill-in  in  some  row  of  Gx .  Thus  the  cost  of  G  is  at  least  k  +  1.  □ 

The  graph  G  just  defined  may  not,  however,  have  cost  exactly  k  +  1,  even  if  ut  =  vh 
1  ^  i  ^  3.  An  example  is  illustrated  in  Fig.  3. 

Corollary  6.3.  Let  Gls  G2,  and  G3  denote  obstructions  for  (k  —  1)-GML,  and  let  vt 
denote  an  arbitrary  vertex  of  Gf  for  1  <  i  <  3.  The  graph  G  =  Gx  u  G2 
uG3u  {v1v2,  v1v3i  v2v3}  has  cost  exactly  k  +  1. 

The  graph  G  just  defined  may  not,  however,  be  an  obstruction  for  /c-GML.  For 
example,  obstruction  12.3.1  listed  in  the  appendix  is  properly  contained  in  the  graph 
constructed  by  setting  G1  —  G2  —  K3  and  setting  G3  =  S(Klt3),  with  v3  a  vertex  of 
degree  one. 


Fig.  3.  A  graph  of  cost  5  built  from  three  graphs  of  cost  3. 
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Lemma  6.4.  Let  Gx,  G2 ,  G3,  G4  and  G5  denote  disjoint  ( but  not  necessarily  distinct) 
obstructions  for  /c-GML,  and  /e£  Ui  denote  an  arbitrary  vertex  of  Gifor  1  ^  i  ^  5.  Let  C5 
denote  a  cycle  graph  of  order  five ,  wz't/z  vertex  set  {vt:  1  ^  z  <  5},  disjoint  from 
G1uG2uG3uG4u  G5.  77ze  prap/z  G  =  G1uG2uG3uG4uG5u  {iW  1  <  i  <  5} 
zs  an  obstruction  for  ( k  +  2)-GML. 

Proof.  Let  Gh  uh  C5,  and  G  be  as  defined  in  the  statement  of  the  lemma. 
Let  M  denote  an  optimal  permutation  for  G  and,  in  M,  let  c,-  denote  a 
column  of  Gt  with  cost  at  least  k  +  1  in  the  rows  of  G*.  If  any  ct  lies  between 
the  leftmost  and  rightmost  columns  of  C5,  then  it  incurs  at  least  two  additional 
fill-ins  (in  rows  of  C5).  Otherwise,  without  loss  of  generality,  assume  cx  lies  to 
the  left  of  c2  which  lies  to  the  left  of  c3  which  lies  to  the  left  of  the  leftmost  column 
of  C5.  In  this  event,  c3  incurs  two  additional  fill-ins  (one  in  a  row  of  Gx  u  {vi}9 
and  one  in  a  row  of  G2  u  {u2}).  Thus  the  cost  of  G  is  at  least  k  +  3. 

Letting  Mt  denote  a  cost  k  +  1  permutation  for  Gt  in  which  every  column  with  cost 
H  1  has  a  1  or  a  fill-in  in  row  uh  we  observe  that  G  has  cost  exactly  k  +  3  as 
evidenced  by  the  permutation  Ml9  UiVl9  M2 ,  u2v 2,  VxV2,  v2v3,  u3v3,  M3,  v3v4,  v4v5, 
v1v5,u4v4f  M4,  u5v 5,  M5. 

We  now  establish  the  minimality  of  G.  As  in  the  proof  of  Lemma  6.1,  we  need  only 
consider  the  effect  of  removing  or  contracting  a  single  edge,  e,  and  may  assume  that 
either  (1)  e  is  in  Gu  (2)  e  =  uxVi>  or  (3)  e  =  vxv2. 

Suppose  (1)  holds.  Let  Gi  and  G'  denote  the  minors  of  Gx  and  G,  respectively, 
that  are  obtained  by  the  removal  or  contraction  of  e  (if  a  contraction,  e  is  con¬ 
tracted  to  ut  if  ee).  Because  Gi  is  minimal  for  parameter  k,  G [  possesses  a  permuta¬ 
tion  Mi  with  cost  at  most  k .  Let  M'l  denote  the  matrix  formed  at  no  extra  cost 
from  Mi  by  adding  row  and  placing  column  UiVt  adjacent  to  an  arbitrary 
column  with  a  1  in  row  ux.  Let  M'  denote  the  permutation  M2,  u2v2,  M3,  u3v3i 
v2v3 ,  V\V2)  Mi',  v3v4,  vxv5,  v4v59  u4v 4,  M4,  u5v5 ,  M5,  which  has  cost  at 
most  k  +  2. 

Suppose  (2)  holds.  If  G'  =  G\{e},  let  M'  denote  the  permutation  M2,  u2v2,  M3, 
u3v 3,  v2v3,  vtv2,  v3v4,  v4v5,  ViV5,  u4v 4,  M4,  u5vs,  M5j  M1s  which  has  cost  at  most 
/c  +  2.  If  G'  is  obtained  from  G  by  contracting  e  to  uu  let  M'  denote  the  permutation 
M2,  m2iz2,  M3,  w3t;3,  t;2z?3,  i?2Mi,  Ml9  uxv 5,  i?4tz5,  r3i?4,  M4V4*  Af4,  which  has 

cost  at  most  k  +  2. 

Suppose  (3)  holds.  If  G'  =  G\{e},  let  M'  denote  the  permutation  M2,  ii2i?2,  1^3, 
u3v3,  M3,  iz3iz4,  n4v4f  M4,  v4v5,  u5v5,  M5,  Mu  which  has  cost  at  most 

k  +  2.  If  G'  is  obtained  from  G  by  contracting  e  to  vu  let  M'  denote  the  permutation 
Mi,  M1U1,  M2,  u2V\ ,  M3,  w3iz3,  t>iU3,  t?3t?4,  i?iu5,  v4v$ ,  W4U4,  M4,  M5U5,  M5,  which  has 
cost  at  most  k  - 1-2. 

In  any  case,  M'  is  a  permutation  for  G',  and  thus  the  cost  of  G'  is  strictly  less  than 
that  of  G.  □ 
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7.  Special  tools  for  three-track  obstructions 

Unlike  the  work  of  the  last  two  sections,  the  results  we  now  derive  hold  only  for 
k  =  3. 

7.1.  General  properties  of  the  three-track  obstructions 

Lemma  7.1.  No  obstruction  for  3-GML  contains  a  vertex  of  degree  four  or  more 
adjacent  to  a  pendant  vertex. 

Proof.  Assume  otherwise,  and  let  G  denote  an  obstruction  for  3-GML  with  vertex  v 
adjacent  to  vertices  w,  x,  y  and  pendant  vertex  z.  Let  G'  =  G\{z},  and  let  M'  denote 
a  cost-three  permutation  for  G'.  Without  loss  of  generality,  assume  that  column  vxv  lies  to 
the  left  of  both  vx  and  vy,  and  that  column  vw  has  a  0  in  row  x.  Let  c  denote  the  column 
that  contains  the  leftmost  1  in  row  x.  We  construct  matrix  M  from  M'  by  adding  row 
z  and  placing  column  vz  to  the  immediate  left  of  c.M  is  a  permutation  for  G  with  cost  at 
most  three,  contradicting  our  assumption  that  G  has  no  three-track  layout.  □ 

This  result  (aided  by  the  corollaries  to  Lemma  5.6)  is  easily  extended. 

Corollary  7.2.  No  obstruction  for  3-GML  contains  two  adjacent  vertices  each  adjacent 
to  a  pendant  vertex. 

Given  a  permutation  for  a  plane  graph,  the  overlap  of  two  or  more  face  spans  is  the 
collection  of  columns  common  to  all  spans. 

Lemma  7.3.  If  a  plane  graph  of  cost  three  contains  two  faces  whose  intersection  is 
exactly  one  vertex ,  then  it  possesses  an  optimal  permutation  in  which  the  overlap  of  the 
spans  for  these  faces  is  empty. 

Proof.  Let  G  denote  a  plane  graph  of  cost  three  with  faces  Fl  and  F2  such  that 
FxnF2  =  v.  Let  M  denote  a  cost-three  permutation  for  G,  and  suppose  the  overlap  of 
the  face  spans  for  Fx  and  F2  is  nonempty.  Because  these  faces  are  not  edge  adjacent, 
their  overlap  contains  at  least  two  columns,  each  with  cost  two  in  the  rows  of  Fly  and 
each  with  cost  two  in  the  rows  of  F2  (Lemma  5.18).  Since  M  has  cost  three,  and  since 
v  is  the  only  vertex  on  both  Fx  and  F2 ,  each  column  of  the  overlap  represents  an  edge 
of  either  F1  or  F2  that  is  incident  on  v. 

Without  loss  of  generality,  assume  the  leftmost  column  of  the  overlap  is  vw  of  F2 , 
with  a  fill-in  in  row  u  of  Fx .  Since  the  cost  of  M  is  three,  the  column  to  the  immediate 
right  of  vw  must  be  uv.  If  v x  of  Fi  is  to  the  right  of  uv ,  then  vw  requires  a  fill-in  in  row 
x  as  well,  contradicting  the  fact  that  M  has  cost  three.  Therefore,  the  overlap  contains 
only  vw  and  uv ,  and  interchanging  the  two  columns  yields  a  cost-three  permutation  for 
G  with  the  desired  property.  □ 
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Lemma  7.4.  If  a  plane  graph  of  cost  three  contains  two  faces  whose  intersection  is 
exactly  one  edge,  then  it  possesses  an  optimal  permutation  in  which  the  overlap  of  the 
spans  for  these  faces  is  exactly  one  column. 

Proof.  Let  G  denote  a  plane  graph  of  cost  three  with  faces  Fx  and  F2  such  that 
F1nF2  =  uv.  Given  a  cost-three  permutation  for  G,  suppose  the  overlap  of  the  face 
spans  for  Fi  and  F2  contains  two  or  more  columns  (it  cannot  be  empty  because  it 
must  contain  uv).  Moreover,  suppose  the  overlap  contains  no  pendant  edges  incident 
on  u  or  v  (any  such  edge  can  be  removed  initially,  then  reinserted  after  our  forthcom¬ 
ing  permutation  modification  at  no  extra  cost). 

Without  loss  of  generality,  assume  that  the  rightmost  column  of  Ft  lies  to  the  right 
of  both  uv  and  the  leftmost  column  of  F2.  It  is  straightforward  to  verify  that  the 
overlap  contains  at  most  three  columns,  that  uv  and  the  leftmost  column  of  F2  are  the 
same,  and  that  the  column  to  the  immediate  right  of  uv  must  have  the  form  uw  (or  vw) 
for  some  weFx.  Thus  column  uv  must  have  a  fill-in  in  row  w,  uw  (or  vw)  must  have 
a  fill-in  in  row  v  (or  u),  and  so  uv  and  uw  (or  vw)  can  be  interchanged  at  no  extra  cost, 
an  action  which  eliminates  a  column  from  the  overlap.  At  most  one  more  application 
of  this  interchange  reduces  the  overlap  to  uv  alone.  □ 

7.2.  Three-track  obstructions  that  are  not  outerplanar 

Since  K4,  an  obstruction  for  3-GML,  is  a  minor  of  both  K5  and  X3>3,  all 
obstructions  for  3-GML  are  planar.  We  now  establish  that  K4  and  the  four  graphs 
illustrated  in  Fig.  4  are  the  only  obstructions  for  3-GML  that  are  not  outerplanar. 

Lemma  7.5.  The  four  graphs  depicted  in  Fig .  4  are  the  only  obstructions  for  3-GML 
with  the  property  that,  for  any  planar  embedding,  there  exists  an  edge  not  adjacent  to  the 
exterior  face. 

Proof.  Computation  suffices  to  check  that  these  four  graphs  are  indeed  obstructions 
for  3-GML;  clearly,  each  has  the  property  stated  in  the  lemma.  Thus  we  need  only  to 
establish  that  these  are  the  only  obstructions  for  3-GML  that  possess  this  property. 

Let  G  =  <  V,  E  )  denote  an  arbitrary  plane  obstruction  for  3-GML  with  the  desired 
property,  and  assume  without  loss  of  generality  that  its  embedding  maximizes  the 
number  of  edges  on  or  adjacent  to  the  exterior  face.  Let  Vf  denote  the  set  of  vertices  on 
this  exterior  face,  and  let  Vn  denote  V\Vf.  Let  G'  denote  the  subgraph  of  G  induced  by 
Vn.  Thus  G'  contains  at  least  one  edge,  uv. 


Fig.  4.  Four  related  obstructions. 
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Let  S  denote  the  set  of  (simple)  paths  in  G  with  an  initial  vertex  in  { u ,  t>},  internal 
vertices  in  Vn,  and  a  terminal  vertex  in  Vf.  If  three  or  more  distinct  terminals  are 
contained  in  the  elements  of  S ,  then  G  >m  K4,  contradicting  the  presumed  minimality 
of  G.  If  every  element  of  S  contains  the  same  terminal,  then  the  connected  component 
of  G'  containing  uv  can  be  moved  to  the  exterior  face,  contradicting  the  presumed 
maximality  of  (the  number  of  edges  on  or  adjacent  to)  that  face.  Thus  the  elements  of 
S  contain  exactly  two  different  terminals,  which  we  denote  by  w  and  x. 

It  now  follows  that  G  contains  three  vertex-disjoint  paths  from  w  to  x.  Moreover, 
the  maximality  of  the  exterior  face  dictates  that  each  path  either  has  length  at  least 
three,  or  contains  an  internal  vertex  adjacent  to  a  distinct,  additional  vertex  not  on 
any  of  the  three  paths.  Therefore  G  H  for  some  H  depicted  in  Fig.  4.  □ 

Lemma  7.6.  No  obstruction  for  3-GML  contains  a  vertex  of  degree  two  adjacent  to 
vertices  of  degree  three  or  more  unless  those  vertices  are  also  adjacent. 

Proof.  Assume  otherwise,  and  let  G  denote  a  plane  obstruction  for  3-GML  with 
degree  two  vertex  v  adjacent  to  vertices  u  and  w,  each  of  degree  three  or  more,  but  not 
adjacent  to  each  other.  Lemma  7.1  and  Corollary  5.9  guarantee  that  neither  u  nor  w  is 
adjacent  to  a  pendant  vertex.  Let  G'  denote  the  minor  of  G  obtained  by  contracting 
edge  uv  to  w,  and  let  M'  denote  a  cost-three  permutation  for  G'.  Consider  the  overlap 
of  the  spans  for  u  and  w,  and  without  loss  of  generality,  assume  the  leftmost  column  is 
uw  and  that  it  contains  the  leftmost  1  in  row  w.  If  the  overlap  is  uw ,  or  if  uw  has  cost 
two,  adding  row  v  and  replacing  uw  with  uv  and  vw  produces  a  cost-three  permutation 
for  G,  a  contradiction.  If  uw  has  a  fill-in  in  row  x,  it  is  straightforward  to  verify  that 
some  column  of  the  overlap  contains  the  rightmost  1  in  row  x,  or  that  the  overlap 
contains  at  most  three  columns  one  of  which  is  ux.  In  either  case,  a  cost-three 
permutation  for  G  can  be  constructed  from  M\  again  contradicting  the  assumption  that 
G  has  no  three-track  layout.  Therefore,  an  obstruction  for  3-GML  contains  a  vertex  of 
degree  two  adjacent  to  vertices  of  degree  three  or  more,  only  if  (as  obstruction  6.4.1  in 
the  appendix  illustrates)  the  three  vertices  are  pairwise  adjacent.  □ 

Lemma  7.7.  K4  and  the  graphs  depicted  in  Fig.  4  are  the  only  obstructions  for  3-GML 
that  are  not  outerplanar. 

Proof.  Assume  otherwise.  Let  G  denote  a  nonouterplanar  obstruction  for  3-GML 
other  than  one  of  the  five  noted  in  the  statement  of  the  lemma.  Thus,  due  to  Lemma 
7.5,  there  is  at  least  one  embedding  of  G  in  which  every  edge  is  adjacent  to  the  exterior 
face.  From  the  embeddings  of  G  with  this  property,  select  one  that  maximizes  the 
number  of  vertices  on  the  exterior  face,  and  let  v  denote  a  vertex  that  is  not  on  this 
face.  It  must  be  that  v  has  degree  two,  since  otherwise  G  X4  due  to  the  way  the 
embedding  was  chosen.  Let  u  and  w  denote  the  vertices  adjacent  to  v.  The  maximality 
of  the  embedding  ensures  that  G  contains  three  edge-disjoint  paths  of  length  two  or 
more  between  u  and  w.  Moreover,  Lemma  7.6  implies  that  uw  e  G. 
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Consider  this  embedding  restricted  to  G'  =  G\{z;}.  There  are  faces  Fx  and  F2  in  G' 
such  that  F1nF2  =  ww.  Let  M'  denote  a  cost-three  permutation  for  G'  in  which,  due 
to  Lemma  7.4,  the  overlap  of  the  face  spans  for  and  F2  is  uw. 

If  uw  contains  no  fill-in,  then  we  construct  a  new  matrix  M  from  M'  by  adding  row 
v  and  placing  columns  uv  and  vw  to  the  immediate  left  of  uw. 

If  uw  contains  a  fill-in  in  some  row  x,  then  it  follows  that  ux  and  wx  must  exist, 
contain  the  only  Is  in  row  x,  and  lie  immediately  to  each  side  of  uw.  In  this  case,  we 
construct  a  new  matrix  M  from  M'  by  adding  row  v  and  placing  columns  uv  and  vw  to 
the  immediate  left  of  the  column  to  the  immediate  left  of  uw. 

In  either  case,  M  is  a  cost-three  permutation  for  G,  contradicting  the  assumption 
that  G  is  an  obstruction  for  3-GML.  □ 

7.3.  Additional  properties  of  three-track  obstructions 

We  shall  henceforth  consider  only  outerplane  obstructions  and  outerplanar  embed¬ 
dings  in  which  all  vertices  lie  on  the  exterior  face.  Thus  the  intersection  of  two  faces  is 
at  most  a  single  edge. 

Lemma  7.8.  If  an  obstruction  for  3-GML  contains  two  faces  that  are  adjacent  at  and 
only  connected  through  a  single  vertex ,  then  at  least  one  of  these  faces  is  a  triangle  with 
two  vertices  of  degree  two. 

Proof.  Let  G  denote  an  obstruction,  with  faces  Fi  and  F2  adjacent  at  and  only 
connected  through  v.  Assume  neither  Fx  nor  F2  is  a  triangle  with  two  vertices  of 
degree  two. 

Let  Ci  denote  the  (unique)  connected  component  of  G\{i>}  that  contains  an  edge 
of  Fx\{v}.  Let  C2  denote  (G\{v})\Ci.  Let  u  and  w  denote  a  pair  of  isolated  vertices 
not  in  G.  We  define  G2  =  (G\C2)  u  {u,  w}  u  {uv,  vw,  uw}  and  G2  =  (G\Ci)  u  { u ,  w}  u 
{uv,  vw,  uw}. 

Observe  that  both  Gi  and  G2  are  proper  minors  of  G  and  both,  therefore,  have 
cost-three  permutations.  It  is  straightforward  to  show  that  Gi  must  possess  an 
optimal  permutation  Mi  with  the  three  columns  of  {u,  v,  w}  on  the  extreme  right,  else 
G  properly  contains  an  obstruction  as  described  in  Lemma  6.1.  Similarly,  G2  must 
possess  an  optimal  permutation  M2  with  the  three  columns  of  { u ,  v,  w }  on  the  extreme 
left. 

But  this  means  that  we  can  construct  a  cost-three  permutation  for  G  by  placing 
M2  to  the  right  of  Mi  and  removing  the  (six)  columns  of  {u,  v,  w}.  This  contradicts  the 
fact  that  G  is  an  obstruction,  however,  and  so  the  assumption  that  neither  Fi  nor  F2  is 
a  triangle  with  two  vertices  of  degree  two  cannot  hold.  □ 

Let  v  denote  a  vertex  on  a  face  of  an  outerplane  graph  G.  If  the  connected 
component  of  G\{mv|w  lies  on  a  face}  that  contains  v  has  at  least  one  edge,  then  we 
term  this  component  the  attachment  at  v. 


N.G.  Kinnersley,  M.A.  Langston  /  Discrete  Applied  Mathematics  54  (1994)  169-213  185 

Lemma  7.9.  If  an  obstruction  for  3-GML  contains  a  face  in  which  two  or  more  vertices 
have  attachments ,  then  each  attachment  is  a  minor  of  S(Klt3). 

Proof.  Assume  otherwise  for  obstruction  G,  in  which  vertices  u  and  v  of  face  F  have 
attachments,  with  the  attachment  at  v,  A(v ),  not  a  minor  of  S(X1>3). 

No  vertex  of  A(v)  has  degree  greater  than  three  unless  A(v)  contains  a  cycle 
(Lemmas  5.4  and  7.1).  No  degree-three  vertex  of  A(v)  is  adjacent  to  both  a  vertex  of 
degree  two  and  a  pendant  vertex  unless  that  vertex  is  v  (Corollary  5.9).  No  degree-two 
vertex  of  A(v)  is  adjacent  to  two  vertices  of  degree  three  (Lemma  7.6).  It  follows  that 
either  A{v)  contains  a  cycle  or  S(Klt3)  <m  A(v),  and  thus  A(v)  has  cost  three. 

Let  A+  =  A(v)  u  {u,  w}  u  { uv ,  vw,  uw}.  Let  A~  —  A(v)\{v}.  It  is  straightforward  to 
show  that  A+  possesses  an  optimal  permutation  M1  in  which  uvw1  is  the  rightmost 
column.  Let  G'  =  (G\A_)u  {x,  y}  u  {xv,  vy,  xy}.  If  A(v)  contains  a  cycle,  then 
Gr  <m  G,  and  thus  G’  has  cost  three.  If  A(v)  is  acyclic,  then  S(K1>3)  <m  A(v),  and  thus 
(with  the  help  of  Lemma  5.16)  again  G'  has  cost  three.  Now  it  is  straightforward  to 
show  that  G'  possesses  an  optimal  permutation  Af2  in  which  vxy  is  the  leftmost 
column.  But  this  means  we  can  construct  a  cost-three  permutation  for  G  by  placing 
M2  to  the  right  of  Mx  and  removing  uvw  and  vxy,  a  contradiction.  □ 

Lemma.  7.10.  If  an  obstruction  for  3-GML  contains  two  faces  that  are  adjacent  at  and 
only  connected  through  a  single  vertex ,  then  there  is  an  obstruction  for  3-GML  with  one 
less  face  and  with  a  vertex  whose  attachment  is  two  or  three  pendant  paths. 

Proof.  Let  G  denote  an  obstruction  with  faces  and  F2  adjacent  at  and  only 
connected  through  v.  Assume  is  a  triangle  in  which  only  v  has  degree  three  or  more 
(Lemma  7.8).  Let  H  denote  the  graph  obtained  from  G  by  deleting  F i\{u}  and 
identifying  the  degree-three  vertex  of  (a  disjoint  copy  of)  S(K1>3)  with  v.  H  has  cost 
four  (Lemma  5.16).  Let  G'  denote  an  obstruction  contained  in  H.  Observe  that,  in  G', 
the  attachment  at  v  contains  more  than  one  pendant  path,  else  G'  <m  G.  Thus,  due  to 
Corollary  5.17,  either  G'  =  H  or  G'  —  H\{vx,xy}  where  x  and  y  are  vertices  on 
a  pendant  path  incident  on  v  and  the  lemma  holds.  □ 

Corollary  7.11.  If  an  obstruction  for  3-GML  contains  two  faces  that  are  adjacent  at  and 
only  connected  through  a  single  vertex  v,  then  v  has  no  attachment. 

We  say  that  two  disjoint  faces  are  separated  if  the  removal  of  some  edge  places  the 
faces  in  different  connected  components. 

Lemma  7.12.  If  an  obstruction  for  3-GML  contains  a  pair  of  separated  faces ,  then  the 
obstruction  is  one  obtained  from  Lemma  6.1. 


7  As  justified  by  Lemma  5.13,  we  shall  from  now  on  represent  a  triangular  face  by  a  column  with  three  Is 
rather  than  three  columns  each  with  two  Is. 
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Proof.  Assume  otherwise  for  obstruction  G  with  separated  faces  F1  and  F2.  Let  uv 
denote  an  edge  of  G  whose  removal  places  and  F2  in  distinct  connected  compo¬ 
nents  Ct  and  C2,  respectively.  Assume  ueC±  and  veC2.  Cx  must  possess  an  optimal 
permutation  Mx  in  which  every  column  to  the  right  of  the  span  for  u  has  cost  two,  else 
CxXjw}  contains  two  disjoint  obstructions  for  2-GML  and  the  minimality  of  G  en¬ 
sures  that  it  is  obtained  from  Lemma  6.1.  Similarly,  C2  must  possess  an  optimal 
permutation  M2  in  which  every  column  to  the  left  of  the  span  for  v  has  cost  two.  But 
now  Mx,uv9  M2  is  a  cost-three  permutation  for  G,  a  contradiction.  □ 

7  A.  Nonextendability  of  these  results  to  four  or  more  tracks 

Unfortunately,  the  results  of  this  section  cannot  be  extended  to  values  of  k  >  3. 
Consider,  for  example,  the  graph  depicted  in  Fig.  5.  We  know  from  Lemma  6.1  that  it 
is  an  obstruction  for  4-GML. 

Clearly,  analogs  of  Lemmas  7.1  and  7.6  are  ruled  out  by  uv  and  w.  Similarly,  Lemma 
6.2  quickly  gives  rise  to  obstructions  for  4-GML  that  eliminate  analogs  for  Lemmas 
7.8  and  7.9.  More  complicated  constructions  [12]  can  be  devised  to  rule  out  analogs 
for  Lemmas  7.3,  7.4  and  7.12. 

8.  The  complete  three-track  obstruction  set 

In  this  section,  we  shall  complete  the  task  of  identifying  all  obstructions  for  3-GML. 
Each  is  given  a  three-integer  name,  denoting  its  number  of  vertices,  its  number  of 
interior  faces  and  an  index.  For  example,  obstruction  8.2.3  is  the  third  obstruction  we 
list  with  eight  vertices  and  two  faces.  For  the  reader’s  convenience,  the  entire  set  is 
displayed  in  an  appendix  to  this  paper. 

8.1.  Obstructions  from  previous  constructions 

Lemma  6.1  provides  twenty  obstructions:  ten  are  trees  (22.0.1-10);  six  have  one  face 
(18.1.1-6);  four  have  separated  faces  (10.3.1  and  14.2.1-3). 
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Lemma  6.2  provides  forty-three  more  obstructions  (6.4.1,  8.3.1,  9.4. 1-2,  11.2.1-2, 
11.3.1,  12.3.1,  13.2.1,  13.3.1-6,  15.1.1-4,  15.2.1-7,  16.2.1,  16.2.5-6,  17.1.1-3,  17.2.1, 
17.2.3-4,  18.1.7,  18.1.9-10,  19.1.1-3,  20.1.1  and  21.1.1). 

Lemma  6.4  provides  one  additional  obstruction  (15.1.5). 

Therefore,  including  the  five  nonouterplanar  obstructions  identified  in  Section  7, 
sixty-nine  obstructions  for  3-GML  are  known  up  to  this  point. 

8.2 .  Conventions  for  describing  new  obstructions 

We  know  from  [5, 12]  and  Lemma  7.12  that  no  more  tree  or  separated-face 
obstructions  are  possible.  Moreover,  those  with  vertex-adjacent  faces  can  be  obtained 
indirectly  with  Lemma  7.10.  Thus  we  now  consider  only  outerplane  graphs  with  either 
a  single  face  or  with  two  or  more  edge-adjacent  faces.  Without  loss  of  generality,  we 
assume  the  outerplane  embedding  induces  a  left-to-right  ordering  of  the  faces,  so  that 
we  can  employ  a  simple  (decimal)  integer  pattern  to  denote  its  face  structure.  In  such 
a  pattern,  the  number  of  digits  equals  the  number  of  faces,  and  the  value  of  each  digit 
equals  the  number  of  vertices  in  the  corresponding  face.  (As  we  shall  see  later,  this  easy 
scheme  suffices,  because  we  need  only  consider  candidate  obstructions  in  which  no 
interior  face  has  more  than  six  vertices.) 

If  a  face  contains  four  or  more  vertices,  then  we  assume  each  vertex  of  the  face  has 
degree  at  least  three  (Lemmas  5.5,  5.6  and  7.6).  If  a  vertex  has  an  attachment,  then  we 
assume  this  attachment  is  either  a  pendant  edge  or  one,  two  or  three  pendant  paths 
(Lemma  7.9).  If  the  attachment  consists  of  three  pendant  paths,  then  a  minimal¬ 
ity-preserving  replacement  is  possible  thanks  to  Lemmas  5.16  and  7.6.  We  term  this 
a  type  1  replacement.  If  the  attachment  is  a  pendant  edge,  then  a  minimality-preserving 
replacement  is  possible  thanks  to  Lemma  5.6.  We  term  this  a  type  2  replacement.  Fig.  6 
illustrates  these  two  replacements,  which  we  shall  use  to  identify  obstructions  that 
might  otherwise  be  missed  due  to  the  assumptions  just  stated. 


Fig.  6  Type  1  and  2  replacements. 
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We  can  thus  use  a  succinct  (character)  string  to  denote  a  graph’s  attachment 
structure.  We  begin  by  visiting  the  vertices  that  lie  on  any  internal  face  clockwise 
around  the  external  face.  If  two  or  more  (internal)  faces  are  present,  then  we  start  with 
the  vertex  at  the  “top”  of  the  edge  shared  by  the  leftmost  two  faces,  otherwise  we  start 
at  an  arbitrary  vertex.  Letting  denote  the  ith  vertex  visited  in  this  fashion,  we 
represent  the  attachment  at  vt  with  the  ith  character  of  the  string.  Such  a  character  is 
either  a  0  to  denote  that  there  is  no  attachment,  the  letter  e  to  denote  that  it  is 
a  pendant  edge,  or  an  integer  in  the  range  [1,  3]  to  denote  the  number  of  pendant 
paths  it  contains. 

New  obstruction  candidates  are  now  uniquely  (modulo  rotations  and  reflec¬ 
tions)  describable  in  pattern-string  form.  For  example,  the  graph  denoted  by 
34-2e300  contains  a  triangle,  edge  adjacent  to  a  square  to  its  right.  These  two  faces 
share  the  edge  vtv^  The  triangle’s  vertex  set  is  {vlfv4r,v5}.  The  attachments  at 
vertices  vl9  v2 ,  and  v3  are,  respectively,  two  pendant  paths,  a  pendant  edge  and  three 
pendant  paths. 

In  describing  permutations  of  graphs,  we  adopt  the  convention  that  u{  denotes  the 
other  vertex  of  an  edge  pendant  at  v{.  It  is  also  helpful  to  use  a  shorthand  for  (complete 
and  partial)  permutations  of  more  complicated  attachments.  See  Fig.  7.  For  example, 
if  three  pendant  paths  are  incident  on  v,  then  we  use  A(v)  in  a  permutation  to  indicate 
that  the  six  edges  of  the  attachment  are  to  be  placed  in  the  order  listed. 

8.3 .  Obstructions  with  one  face 

Triangular  face .  If  two  vertices  of  the  face  have  degree  two,  then  it  is  straight¬ 
forward  to  show  that  the  graph  can  be  obtained  from  Lemma  6. 1 .  Otherwise,  since  the 
attachments  at  the  vertices  of  the  face  are  minors  of  S(Kit  3),  the  graph  can  be  obtained 
from  Lemma  6.2.  Hereafter,  we  shall  not  consider  any  string  that  contains  333,  3321, 
3312,  3213, 3123, 2133  or  1233,  since  the  corresponding  graph  contains  a  minor  whose 
pattern-string  is  3-333  (known  obstruction  21.1.1). 

Square  face.  Pattern-string  4-2221  denotes  new  obstruction  18.1.8.  Pattern-string 
4-232e  represents  known  obstruction  19.1.2.  Any  other  graph  with  this  pattern  either 
contains  one  of  these  obstructions,  or  is  a  minor  of  a  graph  whose  cost-three 
permutation  resides  in  the  following  list. 

4-3131  A(vllCi(v2\v1v2,v2v3,vlv4,v3v4,C2(v4lA(v3) 

4-323e  A(vy\  Bfv2\  v1v2,vlv4r,  u4v4,  v3v4 ,  v2v3i  B2(v2),A(v3) 

4-3311  A(v1),Ci(v4),vlv4,v1v2,v3v4,v2v3,C2(v3),A(v2) 

Pentagonal  face.  Pattern-strings  5-11111  and  5-22ele  correspond  to  known  ob¬ 
structions  15.1.5  and  17.1.1,  respectively.  Any  new  obstruction  with  a  pentagonal  face 
contains  at  least  one,  and  at  most  two  pendant  edges.  If  a  string  has  a  single  e  and 
a  single  1,  then  the  corresponding  graph  contains  obstruction  18.1.8  (4-2221).  Any 
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Attachment 


V 


Name  Notation 
A  A{v) 


B  Bi(v) 

B2(v) 


Permutation 

v  01*110 
a  1  0  0  0  0  0 

b  1  1  0  0  0  0 
c  001000 
d  001100 
e  000001 
/  000011 

v  0  1 
a  1  0 
b  1  1 

u  1  0 
c  0  1 
d  1  1 


v 

O 

b  O 


c  Ci(v) 


v  0  1 

a  1  0 

b  1  1 


C2(v)  v  10 

a  0  1 

b  1  1 


Fig.  7.  Shorthand  used  in  permutations. 


other  candidate  obstruction  is  a  minor  of  a  graph  whose  cost-three  permutation 
resides  in  the  following  list. 

5-3131e  A(v1),C1(v2),v1V2,v1v5,u5v5,v4v5,v3v4,V2V3,C2(v4),A(v3) 

5-31 13e  A(v1),C1(v2),v1V2,v1v5,u5v5,v4v5,v3v4,V2V3,C2(v3),A(v4) 

5-133  le  A(v2),C1(v1),V2V3,v1V29v1v5,u5v5,v4v5,v3v4,C2(v4),A(v3) 

Hereafter,  no  string  with  five  or  more  entries  from  {1, 2,  3}  will  be  considered,  because 
the  corresponding  graph  contains  known  obstruction  15.1.5. 

Hexagonal  face .  If  three  vertices  of  a  graph  with  a  hexagonal  face  have  pendant 
edges  incident  on  them,  then  the  graph  contains  known  obstruction  15.1.1  (6-lelele). 
Thus  we  need  only  consider  strings  whose  two  e  characters  are  in  the  third  and  sixth 
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positions.  The  graph  with  pattern-string  6-22elle  contains  known  obstruction  17.1.1 
(5-22ele).  All  other  possibilities  are  minors  of  a  graph  whose  cost-three  permutation 
resides  in  the  following  list. 

6-31e31e  A(v1),c1(v2),vlv2,v2v3,u3v3,v1v6,u6v6,v5v6,v3v4,v4v5,c2(v5),  A(v4) 

6-13e31e  A(v2),c1(v1),v1v2,v2v3,u3v3,viv6,u6v6,v5v6,v3v4fv4v5,c2(v5),  A(v4) 

Other  faces.  Any  graph  that  contains  a  face  with  seven  or  more  vertices,  each  with 
an  attachment,  must  contain  either  known  obstruction  15.1.1  (6-lelele)  or  known 
obstruction  15.1.5  (5-11111).  An  obstruction  whose  face  contains  seven  or  more 
vertices  must  therefore  have  adjacent  vertices  of  degree  two  on  the  face,  in  which  case 
the  obstruction  can  be  obtained  from  a  type  2  replacement  and  has  already  been 
considered. 

Lemma  8.1.  There  are  exactly  23  obstructions  for  3-GML  that  contain  only  one  face. 

In  summary,  only  one  new  one-faced  obstruction  exists,  bringing  the  total  number 
of  known  obstructions  up  to  70. 

8.4.  Obstructions  with  two  faces 

To  identify  obstructions  with  two  vertex-adjacent  faces,  we  apply  the  reverse  of  the 
replacement  used  in  the  proof  of  Lemma  7.10.  Table  1  summarizes  the  two-faced 
obstructions  thereby  obtained.  Other  two-faced  obstructions  must  contain  edge- 
adjacent  faces. 

Two  triangles.  Pattern-string  33-0232  represents  new  obstruction  18.2.2,  from 
which  new  obstruction  17.2.5  is  obtained  with  a  type  1  replacement.  Pattern-string 


Table  1 

Two-faced  obstructions  from  Lemma  7.10 


Starting  one-faced 
obstruction 

Resultant  two-faced 
obstruction(s) 

17.1.1 

15.2.4 

17.1.2 

15.2.5, 

15.2.6 

17.1.3 

15.2.7 

18.1.8 

15.2.2, 

16.2.43 

18.1.9 

16.2.5 

18.1.10 

16.2.6 

19.1.1 

15.2.1 

19.1.2 

15.2.2, 

17.2.3 

19.1.3 

15.2.3, 

17.2.4 

20.1.1 

16.2.1 

21.1.1 

17.2.1 

aNew  obstruction. 
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33- 3230  denotes  new  obstruction  20.2.1,  from  which  new  obstructions  19.2.1  and 
18.2.1  are  obtained  with  type  1  replacements.  The  graph  with  pattern-string  33-2221 
contains  known  obstruction  18.1.8  (4-2221).  Pattern-string  33-2e22  represents  new 
obstruction  17.2.6,  from  which  new  obstruction  17.2.7  is  obtained  with  a  type  2  re¬ 
placement.  All  other  possibilities  are  minors  of  a  graph  whose  cost-three  permutation 
resides  in  the  following  list. 

33-0323  AfaXBiivsXvMVs,  v1v3v4yB2(v3\A(v4) 

33-1313  A(v2),Ci(v i),vxv2v3,  vx v3v4,C2(v3)M(v4) 

33-3113  A(vx),  C1(v2),v1v2v3i  v1v3v4,  C2(v3),  A(v4) 

33-3131  ‘  A(v1),Ci{v2\v1v2v3,viv3v4,C2{v4),A(v3) 

33- 3320  A(v2),  Bx(v3\v  xv2v3,vxv3  v4,  B2(v3),  A(vx ) 

Triangle  and  square.  We  assume  the  square  is  to  the  right  of  the  triangle,  so  that  both 
v2  and  v3  must  have  attachments.  If  there  is  no  e  in  the  string,  then  there  is  at  least  one 
0  in  a  position  corresponding  to  a  vertex  of  the  triangle.  Since  known  obstruction  13.2.1 
has  pattern-string  34-02200,  we  only  consider  graphs  in  which  v2  or  v3  has  a  pendant 
edge  or  a  single  pendant  path  as  its  attachment.  A  string  with  three  2s  and  a  1  corres¬ 
ponds  to  a  graph  that  contains  known  obstruction  18.1.8  (4-2221).  Pattern-string 

34- 21120  represents  new  obstruction  17.2.2.  Pattern-string  34-2el02  denotes  new  ob¬ 
struction  16.2.2,  from  which  new  obstruction  16.2.3  is  obtained  with  a  type  2  replace¬ 
ment.  Pattern-string  34-111  le  denotes  new  obstruction  14.2.7,  from  which  new  obstruc¬ 
tion  14.2.8  is  obtained  with  a  type  2  replacement.  Graphs  with  pattern-strings  34-2e230 
and  34-2e232  contain  known  obstruction  19.1.2  (4-232e).  The  graph  with  pattern-string 
34-le22e  contains  known  obstruction  17.1.1  (5-22ele).  All  other  possibilities  are  minors 
of  a  graph  whose  cost-three  permutation  resides  in  the  following  list. 

34- 0e323  AfaXBifaXv^VzV^UzV^ViVz,  v1v4v5,B2(v4),A(v5) 

34-01313  A(v3),C1(v2),v2v3,v1v2>v3v4,v1v4v5,C2(v4),A(v5) 

34-01331  A(v3),  Cx(v2),  v2v3,  vx v2,v3v4,  vxv4v5,  C2(v5),  A(v4) 

34-03113  A(v2),C1(v3),v2v3>v1v2,v3v4,v1v4v5,C2(v4),A(v5) 

34-03131  A(v  2),  Cx(v  3),  v2v3,  vi  v2>  V3V41  viv4-v5>  C2(vs)iA(v4) 

34-1^133  A(v4 ),  Cx(v3),  v3v4,  v2v3,  u2v 2,  vxv2 ,  vxv4v5,  C2(vx),A{v5) 

34-lc313  A(v3),  Cx(v4\ v3v4 ,  v2v3,  u2v 2,  vxv2,  v1v4vs,C2(vi),  A{v5) 

34-11330  A(v3),  Ci( v2),  v3v4 ,  v2v3,  vx v2,  vxv4v5,  C2(vi),A(v4 ) 

34-13130  A(v2),  Cx(v3),  v2v3,  v3v4,  vxv2,  vxv4v5,  A(tf4) 

34-3el31  A(v4),  Cx(v3),  v3v4,v2v3,u2v29vxv2,  vlLv4vs,C2(v5)iA(vl) 

34-3c31 1  A(v3\  Ci( v4\  v3v4 ,  v2v3 ,u2v2,  vxv2,  v1v4v5iC2{v5)>  A{vx) 

34-3c320  •A (^3),  Bx (v4\ v3v4i  v2 v3i u2v2) vxv2,  vx v4v5,B2(v4\A(vx) 
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Two  squares .  Pattern-string  44-\el01e  denotes  new  obstruction  14.2.4,  from  which 
new  obstructions  14.2.5  and  14.2.6  are  obtained  with  type  2  replacements.  Graphs 
with  pattern-strings  44-2e\0e2  and  44-0<?22el  contain  known  obstruction  17.1.1 
(5-22ele).  The  graph  with  pattern-string  44-0e2320  contains  known  obstruction  19.1.2 
(4-232e).  If  a  string  contains  no  e,  then  its  first  and  fourth  characters  must  both  be 
0  (Lemma  7.1  and  avoidance  of  known  obstruction  15.1.5  (5-11111)).  Known  obstruc¬ 
tion  13.2.1  (34-02200)  is  a  minor  of  any  graph  with  pattern  44  in  which  both  v2  and  v3 
(or  both  v5  and  v6)  have  two  or  more  pendant  paths  as  attachments.  All  other 
possibilities  are  minors  of  a  graph  whose  cost-three  permutation  resides  in  the 
following  list. 

44-(M313  A(v4),  C1(v3),v3v4,v2v3iu2v2,v1v2,v1v4,v4v5,v1v6,v5v6,  C2(v5%A{v6) 

44-0el331  A(v4\  C2(v6\  A(v5) 

44-0^3 113  Aiv^C^Vtlv^v^u^v^^v^VtVs'V^VsVeXAvslMve) 

44-0^3131  A{v3),Ci(v4)^3V4.,v2v3,u2v2,v1v2,v1v4„v4v5,v1v6,v5v6,C2(v6))A(v5) 

44-0e323e  A{v3\ -61(1^4), v3v4, v2v3 , u2v2 , v±v2 , V\V, 4, ViV^, ti^v^, ^5^6? ^4^5 > ^2(^4)?  ^(^5) 

44-0e331e  A(v3),  v3v4, v2v3, u2v2,v1  v2 , vxv4, vtv6 , u6v6 , v5v6,  v4v5,  C2(v5 ),  v4(t;4) 

44-031031  A(v2),  Cfv3\  v2v3,  vx v2,  v3v4,  vxv4, vtv6,  v4v5,  v5v6,  C2(v6\  A(vs) 

44-031013  A(v2 ),  C1(v3 ),  v2v3,  vtv2,  v3v4,  v^4,  vxv6,  v4v5,  v5v6,  C2(v5),A(v6) 

44-le31e3  A(v3\  Cfv4\  v3v4,  v2v3 ,  u2v 2,  v^v2, v>iv4,v4v5,  u5v5,  v5v6,  ViV6,  C2(v *),  A(v6) 

44-3el3el  A(v4\  Ci(v3),  v3v4 ,  v2v3,u2v2,v1v2,  ViV4,  v4v5iu5v5,v5v6,  ViV6,  C2(v6\ ^4(r>i) 

44-3e31el  A(v3),  Ci(i?4),  v3v4,  v2v3,u2v2,v iV2,  ViV4,  v4v5,  u5v5,v5v6 ,  v^v6,  C2(v6\  A(v j) 

Other  patterns.  The  next  result  ensures  that  all  two-faced  obstructions  with  other 
patterns  are  already  known  (either  by  Lemma  6.2  or  by  type  2  replacements). 

Lemma  8.2.  Obstruction  1 1.2.1  is  the  only  two-faced  outer  plane  obstruction  for  3-GML 
with  edge-adjacent  faces  in  which  one  face  has  five  or  more  vertices  each  with  degree  at 
least  three. 

Proof.  Assume  otherwise  for  some  obstruction  G  with  faces  and  F2 ,  where 
F1nF2~  v1vm,  m  ^  5,  and  vertices  v2,v3, ...  ,vm-i  of  F2  each  has  an  attachment. 
Since  G  does  not  by  assumption  contain  obstruction  11.2.1  (35-OlelOO),  the  attach¬ 
ment  at  v2  or  v4  must  be  a  pendant  edge,  and  the  attachment  at  v3  must  be  one  or 
more  pendant  paths. 

Suppose  the  attachment  at  v2  is  the  pendant  edge  u2v2.  Let  G'  =  G\{u2v2}.  Thanks 
to  Lemma  7.4,  G'  possesses  a  cost-three  permutation  M’  in  which  the  overlap  of  the 
face  spans  for  F\  and  F2  is  column  v  A  the  leftmost  column  of  F 2.  If  the  attachment 
at  v4  contains  one  or  more  pendant  paths,  then  vtv2  must  be  the  rightmost  column  in 
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the  span  for  vt.  But  this  means  that  a  cost-three  permutation  for  G  can  be  constructed 
from  AF,  a  contradiction.  Thus  the  attachment  at  v4  is  a  pendant  edge.  It  follows  that 
F2  must  be  a  pentagon  (else  v5  has  an  attachment  with  one  or  more  pendant  paths  and 
G  properly  contains  obstruction  11.2.1).  Additionally,  both  v A  and  v5  must  have 
attachments,  since  otherwise  M'  can  again  be  modified  to  produce  a  cost-three 
permutation  for  G.  It  is  now  clear  that  Fj  must  be  a  triangle  with  vertex  set  {vl9  v5i  v6 }, 
and  that  v6  must  have  degree  two,  else  G  properly  contains  obstruction  15.1.1 
(6-lelele).  Also,  the  attachment  at  vt  or  v5  must  be  a  single  pendant  path,  else 
G  contains  obstruction  17.1.1  (5-2e\e2).  But  this  means  that  G  is  a  minor  of  the  graph 
with  pattern-string  35-3e3el0,  which  has  cost-three  permutation  A{vi),  Cfv 5),  ViV5v6, 
vxv2)  u2v2,  v4v5,  u4v4,  v3v4,  v2v3,  A{v3\  again  a  contradiction. 

Suppose  the  attachment  at  v2  is  one  or  more  pendant  paths.  The  attachment  at  v4 
must  be  a  pendant  edge,  from  which  it  again  follows  that  F2  must  be  a  pentagon, 
reducing  this  by  symmetry  to  the  previous  case.  □ 

Lemma  8.3.  There  are  39  obstructions  for  3-GML  that  have  exactly  two  faces. 

In  summary,  sixteen  new  two-faced  obstructions  exist,  bringing  the  total  number  of 
known  obstructions  up  to  86. 

8.5.  Obstructions  with  three  faces 

To  identify  obstructions  with  three  faces  some  of  which  are  adjacent  at  and  only 
connected  through  a  single  vertex,  we  again  apply  the  reverse  of  the  replacement  used 
in  the  proof  of  Lemma  7.10.  Table  2  summarizes  the  three-faced  obstructions  thereby 
obtained. 

In  any  additional  three-faced  obstruction,  each  face  must  be  edge  adjacent  to  at 
least  one  other.  Furthermore,  the  three  faces  cannot  be  mutually  edge  adjacent,  else 
the  graph  contains  K4. 

Lemma  8.4.  No  outer  plane  obstruction  for  3-GML  contains  faces ,  F  ,  F2,  and  F3, 
where  F1nF3  =  $,  such  that  both  Fj  and  F3  are  edge  adjacent  to  F2. 

Proof.  Assume  otherwise  for  some  obstruction  G  with  faces  Fls  F2,  and  F3  for  which 
FtnF2  =  v1vr  and  F2nF3  =  vtvj9  where  1  <  i  <  j  <  r. 

Suppose  F2  is  a  square  with  vertex  set  {vl9v29vr-uvr}.  Let  G'  =  G\{iqur},  and  let 
F2  denote  the  (enlarged)  face  that  results  from  the  removal  of  vivr  from  F2.  G' 
possesses  a  cost-three  permutation  M'  in  which  the  overlap  of  the  spans  for  F2  and  F3 
is  column  v2vr- 19  the  leftmost  column  of  F3.  If  both  and  vr  have  attachments,  then 
their  spans  must  include  the  leftmost  column  of  F'2,  and  vxv2  can  be  placed  to  the 
immediate  left  of  the  span  for  F2  to  obtain  a  cost-three  permutation  for  G,  a  contra¬ 
diction.  Thus  vl  or  vr  (and  analogously  v2  or  vr-  i)  has  no  attachment.  If  neither  v2  nor 
vr  has  an  attachment,  then  vxv2  can  be  moved  to  the  immediate  left  of  v2vr-i  and 
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Table  2 

Three-faced  obstructions  from  Lemma  7.10 


Starting  two-faced 
obstruction 

Resultant  three-faced 
obstruction(s) 

13.2.1 

11.3.1 

15.2.2 

13.3.5 

15.2.3 

13.3.6 

15.2.4 

13.3.2 

15.2.5 

13.3.3 

15.2.6 

13.3.3 

15.2.7 

13.3.4 

16.2.1 

12.3.1 

16.2.2 

14.3.3a,  14.3.5“ 

16.2.3 

14.3.4“,  14.3.6“ 

16.2.4 

13.3.1,  14.3.2“ 

16.2.5 

12.3.1 

16.2.6 

12.3.1 

17.2.1 

13.3.1 

17.2.2 

15.3.1“ 

17.2.3 

13.3.1,  13.3.5 

17.2.4 

13.3.1,  13.3.6 

17.2.5 

15.2.1 

17.2.6 

15.2.2,  15.3.3“ 

17.2.7 

15.2.3,  15.3.4“ 

18.2.1 

15.2.1 

18.2.2 

14.3.1“,  16.2.1 

19.2.1 

15.3.2“,  16.2.1 

20.2.1 

16.3.1“,  17.2.1 

“New  obstruction. 


vr-tvr  can  be  moved  to  the  immediate  left  of  vtv2,  making  it  easy  to  construct 
a  cost-three  permutation  for  G,  a  contradiction.  Thus  v2  or  vr  (and  analogously  vx  or 
vr-i)  has  an  attachment.  So,  without  loss  of  generality,  assume  both  vx  and  v2  have 
attachments.  Let  G"  denote  the  graph  obtained  from  G  by  contracting  edge  vr-xvr  to 
vr,  and  let  F"  denote  the  triangle  with  vertex  set  {vlyv2>vr}.  G"  possesses  a  cost-three 
permutation  M"  in  which  vx  v2  lies  between  vx  vr ,  the  rightmost  column  of  F1}  and  v2vr, 
the  leftmost  column  of  F3.  M"  can  now  be  modified  by  adding  row  vr- 15  replacing  v2vr 
by  vr-iVr  and  v2vr-u  and,  in  every  column  to  the  right  of  v2vr-u  interchanging  the 
contents  of  rows  vr  and  vr~x,  thereby  producing  a  cost-three  permutation  for  G, 
a  contradiction. 

F2  must  therefore  have  five  or  more  vertices.  Without  loss  of  generality,  assume 
v2  does  not  lie  on  F3  and  has  degree  three  or  more.  The  attachment  at  v2  must 
be  the  pendant  edge  u2v2  and  v3  must  lie  on  F3,  else  G  contains  obstruction  8.3.1 
(343-010000).  But  now  it  is  a  simple  matter  to  modify  a  cost-three  permutation 
for  G\{u2v2 }  to  obtain  a  cost-three  permutation  for  G,  again  a  contra¬ 
diction.  □ 
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Three  triangles .  Since  known  obstruction  13.2.1  has  pattern-string  34-02200,  and 
since  removal  of  v^v4  (or  v2v4)  leaves  an  edge-adjacent  triangle  and  square,  we  do  not 
consider  any  graph  in  which  both  v2  and  v3  (or  both  and  t>5)  have  two  or  more 
pendant  paths  as  attachments.  Any  graph  with  attachments  at  all  five  vertices 
contains  new  obstruction  13.3.7  denoted  by  pattern-string  333-1  lele,  from  which  new 
obstructions  13.3.8  and  13.3.9  are  obtained  with  type  2  replacements.  Pattern-string 
333-21^20  denotes  new  obstruction  16.3.2,  from  which  new  obstruction  16.3.3  is 
obtained  with  a  type  2  replacement.  Pattern-string  333-22030  denotes  new  obstruc¬ 
tion  19.3.1,  from  which  new  obstruction  18.3.1  is  obtained  with  a  type  1  replacement. 
Graphs  with  pattern-strings  333-00232  and  333-02032  contain  known  obstruction 
18.2.2  (33-0232).  The  graph  with  pattern-string  333-12021  contains  known  obstruc¬ 
tion  17.2.2  (34-21120).  All  other  possibilities  are  minors  of  a  graph  whose  cost-three 
permutation  resides  in  the  following  list. 

333-00323  A(v3),B1(v4),v2v3v4,v1v2v4,v1v4v5,B2(v4),A(v5) 

333-01313  A(v3),C1(v2),v2v3v4,v1v2v4,v1v4v5,C2(v4),A(v5 ) 

333-01331  A(v3),C1(v2),v2v3v4,v1v2v4,vlv4v5,C2(v5),A(v4) 

333-03023  A{v2\  Bx(v4),  v2  v3  v4 ,  vx  v2v4,  vt  v4v5,  B2{v4 ),  A(v5) 

333-03113  v4v5,  C2(v4),  A{v5) 

333-03131  A(v2),C1(v3lv2v3v4)v1v2v4,v1v4v5,C2(v5),A(v4) 

333-11033  Aiv^Ciivilv^^v^^^ViV^sXiiviXAivs) 

333-11303  A(v3\Cl(v2),v2v3v4,v1v2v4,v1v4v5,C2(vl),A(v5) 

333-13013  A{v2\  Ci(v4),  v2v3v4,  v1v2v4,v1v4v5}  C2(vi),A(v5) 

333-13103  A(v2),C1(v3),v2v3v4,v1v2v4>v1v4v5,C2(v1),A(v5) 

333-31031  A(v4),C1(v2),v2v3v4,v1v2v4,v1v4v5,C2(v5)9A(v1) 

333-33011  A(v2),C1(v4)9v2v3v4,v1v2v4,v1v4v5,C2(v5),A(v1) 

333-33020  A(v2\Bl(v4)tv2v3v4,v1v2v49vlv4v5,B2(v4),A(v1) 

333-33101  A(v2\Ci(v3),v2v3v4,v1v2v4,v1v4v59C2(v5),A{vl) 

Other  patterns.  The  next  result  ensures  that  all  three-faced  obstructions  with  other 
patterns  are  already  known  (either  by  Lemma  6.2  or  by  type  2  replacements). 

Lemma  8.5.  Obstruction  8.3.1  is  the  only  three-faced  outerplane  obstruction  for  3-GML 
in  which  each  face  is  edge  adjacent  to  at  least  one  other  and  one  face  has  four  or  more 
vertices  each  with  degree  at  least  three. 

Proof.  Assume  otherwise  for  some  obstruction  G  with  faces  Fu  F2,  and  F3  such  that 
both  Fx  and  F3  are  edge  adjacent  to  F2.  Thanks  to  lemma  8.4,  we  may  assume 
Fi  nF 2  =  VxVr,  and  F2nF3  =  vtvr. 
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Suppose  F2  is  not  a  triangle.  To  avoid  obstruction  8.3.1  (343-010000),  F2  must  be 
a  square  with  vertex  set  {vl9v2,v39vr},  and  v2  must  be  adjacent  to  pendant  vertex  u2 . 
Let  G'  =  G\{u2v 2}.  G'  possesses  a  cost-three  permutation  M  in  which  vxv2  and  v2v3 
lie  between  vxv„  the  rightmost  column  of  Fu  and  v3vn  the  leftmost  column  of  F3.  It  is 
straightforward  to  verify  that  v1v2  contains  the  rightmost  1  in  row  vl9  that  v2v3  is  to 
the  immediate  right  of  vxv29  and  that  u2v2  can  be  inserted  in  M'  to  produce 
a  cost-three  permutation  for  G,  a  contradiction. 

Thus  F 2  must  be  a  triangle.  Without  loss  of  generality,  assume  F3  has  at  least  four 
vertices  each  with  degree  at  least  three.  If  v2  has  an  attachment,  then  to  avoid 
obstruction  11.2.1  (35-0M00)  it  follows  that  F3  must  be  a  square  with  vertex  set 
{v2,v39v4,v5},  the  attachment  at  v4  is  the  pendant  edge  u4v4 ,  and  the  attachment  at  v3 
contains  at  least  one  pendant  path.  Let  G"  =  G\{u4v4 }  and  let  M"  denote  a  cost-three 
permutation  for  G"  in  which  the  span  for  F3  is  to  the  right  of  column  vlv2v5.  Since  v4v5 
must  be  the  rightmost  column  in  the  span  for  v5 ,  it  is  straightforward  to  construct 
a  cost-three  permutation  for  G,  a  contradiction.  Thus  v2  can  have  no  attachment.  Let 
G"f  denote  the  graph  obtained  from  G  by  contracting  edge  v2v3  to  v2 ,  and  let  F 3  denote 
the  (shrunken)  face  that  results  from  this  contraction  in  F3.  Using  a  cost-three  permuta¬ 
tion  for  G'"  in  which  the  span  for  F3  is  to  the  right  of  column  vxv2vn  it  is  again 
straightforward  to  construct  a  cost-three  permutation  for  G,  a  contradiction.  □ 

Lemma  8.6.  There  are  29  obstructions  for  3-GML  that  have  exactly  three  faces. 

In  summary,  eighteen  new  three-faced  obstructions  exist,  bringing  the  total  number 
of  known  obstructions  up  to  104. 

8.6.  Obstructions  with  four  faces 

To  identify  obstructions  with  four  faces  some  of  which  are  adjacent  at  and  only  con¬ 
nected  through  a  single  vertex,  we  again  apply  the  reverse  of  the  replacement  used  in  the 
proof  of  Lemma  7.10.  Table  3  summarizes  the  four-faced  obstructions  thereby  obtained. 

In  any  additional  four-faced  obstruction,  each  face  must  be  edge  adjacent  to  at  least 
one  other.  One  face  cannot  be  edge  adjacent  to  the  other  three,  else  the  graph  contains 
known  obstruction  6.4.1.  Furthermore,  to  avoid  K4 ,  at  least  two  faces  must  be  edge 
adjacent  to  exactly  one  other  face.  Our  next  result  ensures  that  all  four-faced  obstruc¬ 
tions  are  already  known. 

A  chain  in  an  outerplane  graph  is  a  sequence  of  faces  FuF29...,Fh  such  that  F  f  and 
Fj  intersect  at  a  single  edge  if  |  i  —  j  |  =  1,  and  are  either  disjoint  or  intersect  at  a  single 
vertex  otherwise.  The  length  of  a  chain  is  the  number  of  faces  it  contains.  Fig.  8 
illustrates  four  different  four-faced  chains. 


4 - 4 


W 


Fig.  8.  Sample  four-faced  chains. 
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Table  3 

Four-faced  obstructions  from  Lemma  7.10 


Starting  three-faced 
obstruction 

Resultant  four-faced 
obstruction(s) 

11.3.1 

9.4.2 

13.3.1 

9.4.1 

13.3.5 

9.4.1 

13.3.6 

9.4.1 

14.3.1 

12.3.1 

14.3.2 

9.4.1 

14.3.3 

12.3.1 

14.3.4 

12.3.1 

14.3.5 

12.3.1 

14.3.6 

12.3.1 

15.3.1 

12.4.13 

15.3.2 

12.3.1 

15.3.3 

12.4.1a,  13.3.1 

15.3.4 

12.4.13,  13.3.1 

16.3.1 

12.4.13,  13.3.1 

16.3.2 

14.4.  la,  14.4.33 

16.3.3 

14.4.23,  14.4.43 

18.3.1 

15.3.2 

19.3.1 

15.4.13,  16.3.1 

aNew  obstruction. 


Lemma  8.7.  No  obstruction  for  3-GML  contains  a  chain  whose  length  exceeds  three. 

Proof.  Assume  otherwise  for  some  obstruction  G  with  chain  Fx,F2i ...  ,Fh  where 
h  ^  4  and  Ft  n  Fi+  x  =  t^w,-  for  1  <  i  <  h. 

Thanks  to  Lemma  8.4,  we  assume  without  loss  of  generality  that  wx  =  w2.  To  avoid 
obstruction  8.3.1,  G  must  contain  either  v1v2  or  a  degree-three  vertex  x  adjacent  to 
vuv2  and  pendant  vertex  y. 

Let  G  =  G\{u2w2},  and  let  F2  denote  the  (enlarged)  face  that  results  from  the 
removal  of  v2w2  from  F2.G'  possesses  a  cost-three  permutation  in  which  the  overlap  of 
the  spans  for  Fx  and  F2  is  vxwXi  the  leftmost  column  of  F2.  Since  any  attachment  at  vx 
must  lie  to  the  left  of  the  span  for  Fx,  since  the  span  for  F4  must  be  to  the  right  of  vxwXi 
and  since  outerplanarity  ensures  vx  F4,  column  vxv2  (or  column  vxx)  must  contain  the 
rightmost  1  in  row  vx .  Thus,  with  no  increase  in  cost,  column  vx  v2  (or  the  set  of  columns 
a  vxx,  xy,  xv 2)  may  be  moved  to  the  immediate  right  of  vxwx,  from  which  it  is 
straightforward  to  construct  a  cost-three  permutation  for  G,  a  contradiction.  □ 

Lemma  8.8.  There  are  nine  obstructions  for  3-GML  that  have  exactly  four  faces. 

In  summary,  six  new  four-faced  obstructions  exist,  bringing  the  total  number  of 
known  obstructions  up  to  110.  We  shall  now  show  that  there  are  no  more. 
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Table  4 

A  review  of  the  3-GML 
obstruction  set 


Number  of 
faces 

Number  of 
obstructions 

none 

10 

one 

23 

two 

39 

three 

29 

four 

9 

five  or  more 

0 

8. 7.  Obstructions  with  five  or  more  faces 

Lemma  8.9.  No  obstruction  for  3-GML  contains  five  or  more  faces. 

Proof.  The  reverse  of  the  replacement  used  in  the  proof  of  Lemma  7.10  generates  only 
known  obstructions  9.4.1  and  12.4.1.  Thus  there  can  be  no  obstruction  with  five  or 
more  faces  some  of  which  are  adjacent  at  and  only  connected  through  a  single  vertex. 
Thanks  to  Lemmas  7.8  and  8.7,  no  obstruction  can  contain  either  separated  faces  or 
a  chain  whose  length  exceeds  three.  □ 


9.  Main  result 

All  elements  of  the  3-GML  obstruction  set  are  now  known.  The  structure  of  this  set 
is  reviewed  in  Table  4. 

Theorem  9.1.  There  are  exactly  110  obstructions  for  3-GML,  namely ,  those  identified  in 
preceding  results  and  depicted  in  the  appendix. 


10.  Conclusions 

Gate  matrix  layout  is  a  well-known  but  notoriously  difficult  problem.  Each  of  its 
fixed-parameter  variants,  however,  possesses  a  finite-basis  characterization  that  pro¬ 
vides  a  polynomial-time  recognition  algorithm.  In  this  paper,  we  have  isolated  the 
basis  for  parameter  value  three.  In  order  to  accomplish  this,  we  have  also  derived 
a  number  of  more  general  results  to  bound  and  identify  basis  elements  for  any 
parameter  value. 

We  conjecture  that  the  trees  are  the  largest  elements  in  each  basis.  A  proof  of  this,  if 
it  is  indeed  true,  would  be  particularly  interesting,  because  it  would  automatically 
mean  that  every  basis  is  computable.  (Exhaustive  computation  could,  at  least  in 
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principle,  be  applied  until  the  trees  were  reached,  after  which  it  would  be  pointless  to 
look  further.) 

Lemma  6.1  makes  it  easy  to  see  that  basis  size  grows  monotonically.  This  and  the 
fact  that  the  basis  for  parameter  value  four  contains  at  least  122  million  elements  [12] 
suggest  that  no  other  bases  for  this  problem  are  likely  to  be  isolated  in  the  foreseeable 
future. 
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In  an  earlier  research  paper ,9  we  presented  a  novel ,  yet  straightforward  linear-time  algorithm  f Of  merging  two  sorted 
lists  in  a  fixed  amount  of  additional  space .  Constant  of  proportionality  estimates  and  empirical  testing  reveal  that  this 
procedure  is  reasonably  competitive  with  merge  routines  free  to  squander  unbounded  additional  memory ,  making  it 
particularly  attractive  whenever  space  is  a  critical  resource .  In  this  paper,  we  devise  a  relatively  simple  strategy  by 
which  this  efficient  merge  can  he  made  stable ,  and  extend  our  results  in  a  nontrivial  way  to  the  problem  of  stable 
sorting  by  merging.  We  also  derive  upper  hounds  on  our  algorithms'  constants  of  proportionality ,  suggesting  that  in 
some  environments  (most  notably  external  file  processing)  their  modest  run-time  premiums  may  be  more  than  offset  by 
the  dramatic  space  savings  achieved . 
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1.  INTRODUCTION 

It  is  a  well -recognised  phenomenon  that  no  matter  how 
much  main  memory  (also  known  as  core  memory, 
directly-addressable  memory,  non-virtual  memory  or 
real  memory)  is  made  available  to  a  collection  of  users 
and  systems,  it  seems  never  to  be  enough  to  satisfy 
everyone  completely.  Although  main  memory  is  often 
rather  cavalierly  regarded  as  an  inexpensive  resource,  its 
availability  is  in  fact  critical  in  many  applications.  We  are 
reminded  of  the  following  passage  by  the  witty  and 
imaginative  science  writer  D.  E.  H.  Jones:13 

Let's  assume  that  the  brain,  like  most  computers, 
stores  intelligence  (programs)  and  memory  (data)  in 
the  same  form  and  distributed  throughout  the  same 
volume.  Then  the  more  space  is  taken  up  by  data  the 
less  is  available  for  programs  and  working  space. 
Clearly  as  life  progresses  and  memories  multiply, 
there  must  come  a  time  when  programs  and  working 
space  get  squeezed.  This  must  be  senility. 

Naturally,  main  memory  should  be  allocated  and 
managed  carefully  to  avoid  thrashing2  and  other  forms 
of  computer  senility.  This  is  particularly  true  for  heavily- 
used  operations  like  merging  and  sorting,  known  to 
dominate  a  large  portion  of  all  available  execution  time 
over  broad  classes  of  computer  systems.14  This  is  even 
more  evident  when  performing  these  operations  over 
enormous  external  files.  In  such  an  environment,  the 
overall  processing  time  is  frequently  determined  not  by 
the  speed  of  the  algorithm  used  to  process  file  segments 
internally,  but  rather  by  the  ways  in  which  available 
main  memory  can  be  used  to  accommodate  more  and 
larger  buffers,  thereby  increasing  device  and  channel 
parallelism  while  decreasing  the  number  of  I/O  transfers 
required. 

Unfortunately,  for  stable  merging  and  sorting,  the 

*  A  preliminary  version  of  a  portion  of  this  paper  was  presented  at 
the  International  Conference  on  Computing  and  Information,  held  in 
Toronto,  Ontario,  Canada,  in  May  1989. 
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under  grants  ECS-8403859  and  MI  P-8603879;  and  by  the  Office  of 
Naval  Research  under  contract  N00014-88-K.-0343. 


obvious  algorithms  that  work  in  asymptotically  optimal 
time  (O(rt)  and  0{n  log  n ),  respectively)  waste  a  whopping 
ft(rt)  extra  memory  cells  for  temporary  storage.  Con¬ 
versely,  the  conspicuous  ways  to  merge  and  sort  in  0(1) 
extra  space  are  either  unstable  or  require  Q(n 2)  time.  A 
number  of  stable  merging  schemes  that  use  more  than 
linear  time  or  more  than  constant  extra  space  have  been 
suggested,1-3  5  *  25  as  have  several  stable  sorting  strategies 
that  use  more  than  0(n\ogn )  time  or  more  than  0(1) 
extra  space.7-17  19  20  Also,  a  routine  that  dynamically 
alters  keys  has  been  defined,12  but  is  thus  applicable  only 
to  files  in  which  keys  are  explicitly  stored  within  records. 
The  only  previously-known  general  method  for  stably 
merging  and  sorting  in  both  optimal  time  and  optimal 
extra  space22-23  is  widely  regarded  as  a  result  of  purely 
theoretical  interest,15,24  since  it  is  exceedingly  complex 
and  its  time-complexity  constant  of  proportionality  is  so 
huge  that  it  hasn’t  even  been  derived.  (Recent  modifi¬ 
cations  have  been  suggested  that  simplify  parts  of  this 
method,  but  its  overall  constant  of  proportionality 
remains  prohibitively  large  and  unbounded.)21  This 
contrasts  poorly  with  unstable  merging,  where  much 
progress  has  been  made  in  achieving  practical  and 
straightforward  optimal  time  and  space  methods  and 
with  unstable  sorting,  where  the  simple  heap-sort 
algorithm  suffices. 

The  main  result  of  this  paper  is  a  relatively  simple, 
efficient  and  general  scheme  for  stable  merging  (and  thus 
stable  merge-sorting)  in  optimal  time  and  space.  Our 
method  is  based  on  our  recently-reported  algorithm  for 
fast,  in-place  unstable  merging.9  We  also  present  an  even 
more  streamlined  0(n\ogn)  time  and  0(1)  extra  space 
stable  sorting  procedure  for  files  for  which  there  is  a 
reasonable  limit  on  the  number  of  times  each  key  can 
appear.  Significantly,  and  unlike  previously-reported 
schemes  to  solve  these  problems,  we  derive  explicit  upper 
bounds  on  the  number  of  key  comparisons  and  record 
exchanges  our  methods  require.  Note  that  these  strategies 
may  be  especially  useful  for  operations  such  as  stable 
polyphase,  balanced  or  cascade  merge-sorting  with 
external  storage  media  such  as  tape:  larger  initial  runs 
mean  fewer  passes  of  the  file  (the  common  replace¬ 
ment-selection  method  is  unstable),  and  more  available 
memory  for  buffer  space  can  mean  less  time  consumed  in 
each  pass. 
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In  the  next  section,  we  discuss  pertinent  background 
information  and  related  work.  Section  3  comprises  the 
definitions  and  notational  conventions  we  shall  need  to 
present  and  analyse  our  algorithms.  In  Section  4,  we 
review  the  fundamental,  optimal  time  and  space  unstable 
merge  of  Ref.  9  and  define  our  modifications  that  ensure 
stability.  Also,  to  provide  an  upper  bound  on  the 
resultant  procedure’s  worst-cast  constant  of  propor¬ 
tionality,  we  prove  that  the  total  number  of  key 
comparisons  and  record  exchanges  required  never  ex¬ 
ceeds  In  (plus  lower-order  terms).  Section  5  extends  our 
work  to  the  problem  of  optimal  time  and  space  stable 
sorting  in  a  nontrivial  way.  We  devise  an  alternative  to 
the  obvious  merge-sort  strategy,  and  show  that  it  never 
needs  more  than  2.5  nIog2/?  (plus  lower-order  terms)  key 
comparisons  and  record  exchanges.  In  the  final  section, 
we  draw  a  few  conclusions  from  this  effort  and  pose 
questions  that  we  believe  merit  further  investigation. 

2.  RELATED  WORK 

The  general  approach  that  we  shall  employ  inherently 
relies  on  the  notions  of  internal  buffering  and  block 
rearranging ,  and  can  be  traced  back  to  the  seminal  work 
on  unstable  merging  described  in  Ref.  16.  Simply  stated, 
with  this  approach  we  attempt  to  view  a  list  of  n  records 
as  a  sequence  of  0(y/ri)  blocks,  each  of  size  0(\/n).  This 
allows  us  to  employ  one  block  as  an  internal  buffer  to  aid 
in  rearranging  or  otherwise  manipulating  the  other 
blocks  in  constant  extra  space.  Since  only  the  contents  of 
the  buffer  and  the  relative  order  of  the  blocks  need  be  out 
of  sequence,  linear  time  is  sufficient  to  perform  a  merge 
with  the  aid  of  selection  sorting  both  the  buffer  and  the 
blocks  (each  sort  involves  0(y/ri)  keys). 

After  the  unstable  method  in  Ref.  16  appeared,  a 
stable  procedure  was  proposed  in  Ref.  12  that,  un¬ 
fortunately,  had  the  rather  undesirable  side-effect  that 
records  had  to  be  alterable  during  its  execution. 
Subsequently,  a  general  algorithm  for  optimal  time  and 
space,  stable  merging  and  sorting  was  published.22-23  as 
was  a  technique  for  simplifying  parts  of  its  control 
structure.21  For  the  most  part,  however,  those  results 
have  been  of  academic  interest  only,  due  primarily  to 
their  discouraging  complexity  and  their  prohibitively 
large  time-complexity  constants  of  proportionality. 

More  recent  research  efforts  have  begun  to  focus  on 
simpler,  more  practical  optimal  time  and  space  internal 
buffering  and  block  rearranging  strategies  for  unstable 
merging4  *-18  as  well  as  for  extracting  duplicates  from  a 
sorted  list10  and  for  all  of  the  binary  set  and  multiset 
operations  on  sorted  lists,11  with  potential  application  to 
a  number  of  file  processing  problems. 

3.  NOTATION,  DEFINITIONS  AND 
USEFUL  SUBPROGRAMS 

Let  L  denote  a  list  (internal  file)  of  n  records,  indexed 
from  1  to  n.  An  algorithm  for  rearranging  the  order  of 
the  records  of  L  is  said  to  be  stable  if  it  ensures  that,  when 
it  is  done,  records  with  identical  keys  retain  the  relative 
order  they  had  before  the  algorithm  began.  We  use 
KEY(i )  as  a  shorthand  to  denote  the  key  of  the  record 
with  index  /.  Only  the  two  common  0(1)  time  and  space 
primitive  operations  arc  assumed,  namely,  record  ex¬ 
changes  and  key  comparisons.  The  exchange  procedure. 


SWAP(iJ ),  directs  that  the  zth  and  yth  records  are  to  be 
exchanged.  The  comparison  functions,  for  example 
KEY{i)  <  KEY(J),  return  the  expected  Boolean  values 
dependent  on  the  relative  values  of  the  keys  being 
compared. 

From  these  primitive  operations,  we  construct  a  few 
0(1)  space  useful  subprograms  for  dealing  with  blocks. 
Let  us  define  a  block  to  be  a  set  of  records  from  L  with 
consecutive  indices.  The  head  of  a  block  is  the  record 
with  the  lowest  index  (or,  informally,  the  ‘leftmost’ 
record  in  the  block);  the  tail  of  a  block  is  the  record  with 
the  highest  index  (the  ‘rightmost’  record  in  the  block). 
The  procedure  BLOCKSWAP(ij\h)  exchanges  a  block 
of  h  records  beginning  at  index  i  with  a  block  of  h  records 
beginning  at  index  j  in  0(h)  time.  We  specify  that  blocks 
do  not  partially  overlap  (i.e.,  if  i  4=  j  then  h  <  |  i—j  J )  and 
that,  when  BLOCKSWAP  is  finished,  records  within  a 
moved  block  retain  the  order  they  possessed  before 
BLOCKSWAP  was  invoked.  A  block  of  h  records 
beginning  at  index  i  is  sorted  in  nondecreasing  order  by 
the  procedure  S0R7\i,h).  The  procedure  BLOCK- 
SORT(i,h,p)  uses  BLOCKSWAP  to  rearrange  the  p 
consecutive  blocks,  each  with  h  records,  beginning  at 
index  i  so  that  their  tails  are  sorted  in  nondecreasing 
order.  To  reduce  unnecessary  record  movement,  an 
important  consideration  when  records  are  relatively 
long,  we  insist  that  BLOCKSORT  use  the  0(p2+ph) 
time  straight  selection  sort.14 

The  procedure  ROTATE(i,h,t)  rotates  (circularly 
shifts)  a  block  of  h  records,  beginning  at  index  /,/  places 
to  the  left.  We  assume  that  ROTATE  is  implemented  in 
the  common  fashion  with  three  sublist  reversals,  thereby 
requiring  no  more  than  h  invocations  of  SWAP . 

Finally,  a  pair  of  consecutive  blocks,  each  sorted  in 
nondecreasing  order,  is  merged  with  BLOCK - 
MERGER  h,  k),  where  the  first  block  contains  h 
records  beginning  at  index  /  and  the  second  contains  k 
records  beginning  at  index  i+h.  BLOCKMERGE  uses 
ROTA  TE  to  merge  the  shorter  block  into  the  longer  one. 
For  example,  if  h  ^  k,  then  BLOCKMERGE  merges  the 
first  block  forward  into  the  second  as  follows.  A  binary 
search  of  the  second  block  is  used  to  find  the  leftmost 
insertion  point  for  the  leftmost  record  of  the  first  block. 
That  is,  assuming  KEY(i+h)  <  KEY(i)  ^  KEY(i  +  h 
+  k—  1),  the  displacement  p  is  computed  for  which  KEY 
(i+h+p)<KEY(i)^KEY(i+h+p+\l  followed  by 
an  invocation  of  ROTATE(i>h+p,h).  The  first  record  of 
the  shorter  block  and  all  records  to  its  left  are  now 
merged.  The  merge  is  completed  by  iterating  this 
operation  until  one  of  the  blocks  is  exhausted,  resulting 
in  a  time  complexity  of  0(h*+k).  (There  are  at  most 
0(h  log  k)  comparisons.  Records  from  the  shorter  block 
are  moved  no  more  than  h  times,  while  records  from  the 
longer  block  are  moved  only  once.)  Of  course,  if  h  >  k, 
then  BLOCKMERGE  is  better  off  to  merge  the  second 
block  backward  into  the  first  in  0(h  +  k2)  time. 

4  STABLE  IN-PLACE  MERGING 

4.1  A  Review  of  the  Fundamental,  Unstable  Merge 

Suppose  L  contains  two  sublists  to  be  merged,  each  with 
its  keys  in  nondecreasing  order.  In  Ref.  9  we  presented  a 
fast  and  surprisingly  simple  algorithm  for  (unstably) 
merging  in  linear  time  and  constant  extra  space.  Even 
without  ‘tinkering’  with  it  to  achieve  an  especially 
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efficient  implementation,  its  average  run  time  on  large 
lists  exceeds  that  of  the  standard,  widely-used  merge 
(which  is  free  to  exploit  0(n)  temporary  extra  memory 
cells)  by  less  than  a  factor  of  two.  Aspects  that  contribute 
to  its  straightforwardness  include  a  rearrangement  ot 
blocks  before  a  merging  phase  is  initiated  and  an  efficient 
pass  of  the  internal  buffer  across  the  list  to  reduce 

unnecessary  record  movement. 

We  now  briefly  review  the  central  features  of  this  U{n) 
time  and  0(1)  extra  space  method,  with  a  number  of 
simplifying  assumptions  made  about  L  to  facilitate 
discussion.  We  refer  the  reader  to  Ref.  9  for  a  complete 
exposition  of  the  algorithm,  an  example,  and  the  OW") 
time  and  0(1)  space  implementation  details  necessary  lor 

handling  arbitrary  inputs.  .  #  ^ 

Let  us  suppose  that  n  is  a  perfect  square,  and  that  we 
have  already  permuted  the  records  of  L  so  that  y/n 
largest-keyed  records  are  at  the  front  thc  !isl  ^th!Jr 
relative  order  there  is  immaterial),  followed  by  the 
remainders  of  the  two  sublists,  each  of  which  we  now 
assume  contains  an  integral  multiple  of  y/n  records  in 
nondecreasing  order. 

Therefore,  we  view  L  as  a  series  of  y/n  blocks,  each  ot 
size  y/n .  We  will  use  the  leading  block  as  an  internal 
buffer  to  aid  in  the  merge.  Our  first  step  is  to  invoke 
BLOCKSORT  on  the  y/n-\  rightmost  blocks,  after 
which  their  tails  form  a  nondecreasing  key  sequence.  (In 
this  setting,  selection  sort  requires  only  0(n)  key 
comparisons  and  record  exchanges.)  Records  within  a 
block  retain  their  original  relative  order. 

Next,  we  locate  two  series  of  records  to  be  merged,  i  he 
first  series  begins  with  the  head  of  block  2  and  terminates 
with  the  tail  of  block  ij>  2,  where  block  /  is  the  first 
block  such  that  the  key  of  the  tail  of  block  /  exceeds  the 
key  of  the  head  of  block  /+  1 .  The  second  senes  consists 
solely  of  the  records  of  block  i+  1.  We  now  use  the  buffer 
to  merge  these  two  series.  That  is,  we  repeatedly  compare 
the  leftmost  unmerged  record  in  the  first  series  to  the 
leftmost  unmerged  record  in  the  second,  swapping  the 
smaller-keyed  record  with  the  leftmost  buffer  element. 
Ties  are  broken  in  favour  of  the  leftmost  series.  (In 
general,  the  buffer  may  be  broken  into  two  pieces  as  wc 
merge.)  We  halt  this  process  when  the  tail  of  block  i  has 

been  moved  to  its  final  position. 

We  now  locate  the  next  two  series  of  records  to  be 
merged.  This  time,  the  first  begins  with  the  leftmost 
unmerged  record  of  block  /+ 1  and  terminates  as  before 
for  some  j  >  /.  The  second  consists  solely  of  the  records 
of  block  ;'+  1.  We  resume  the  merge  until  the  tail  of  block 

j  has  been  moved.  •  , 

We  continue  this  process  of  locating  series  of  records 
and  merging  them  until  we  reach  a  point  were  only  one 
such  series  exists,  which  we  merely  shift  left,  leaving  the 
b  uffer  in  the  last  block.  A  sort  of  the  buffer  completes  the 

merge  of  L.  .  .  , 

0(1)  space  suffices  for  this  procedure,  since  the  butter 

was  internal  to  the  list,  and  since  only  a  handful  of 
additional  pointers  and  counters  are  necessary.  0\n)  time 
suffices  as  well,  since  the  block  sorting,  the  senes 
merging  and  the  buffer  sorting  each  require  at  most 
linear  time. 


4.2  Obstacles  to  Stability 

The  primary  problem  to  be  addressed  in  order  to 


achieve  stability  is  the  need  to  be  able  to  distinguish 
blocks  as  to  whether  each  originated  in  the  first  or  the 
second  sublist.  This  is  a  more  difficult  task  than  it  may 
seem  at  first  blush.  A  number  of  schemes  will  do  if  each 
block  has  different  keys  at  its  head  and  its  tail  For 
example,  we  could  simply  make  a  temporary  swap  of  the 
records  at  the  head  and  tail  of  a  block  if  and  only  if  it 
originated  in,  say,  the  second  sublist.  (Such  a  swap  would 
be  made  during  the  blocking-sorting  phase  and  undone 
during  the  series-merging  phase.)  The  real  problem  lies 
with  homogeneous  blocks,  those  in  which  every  record  in 
the  block  has  the  same  key  as  every  other  record  in  the 
block  To  illustrate  this  conundrum,  suppose  we  know 
by  some  artifice  that  block  i  >  2  originated  m  the  first 
sublist  but  only  that  block  i-  l  is  homogeneous.  Also, 
suppose  the  key  of  the  tail  of  block  i-  I  equals  the  key 
of  the  head  of  block  i,  but  is  strictly  less  than  the  key 
of  the  tail  of  block  /'.  In  this  circumstance,  stability  is 
jeopardized  since  we  cannot  determine  whether  the  head 
of  block  i  should  be  merged  to  the  left  or  to  the  right  of 
the  records  of  block  i- 1 .  (It  should  go  to  the  left  if  block 
/_1  originated  in  the  second  sublist,  but  to  the  right 

otherwise.)  „  ,  ,  .  .. 

Additionally,  we  must  be  wary  of  a  few  other  deta  Is 
that  if  neglected,  can  compromise  stability.  For  example, 
we  need  to  load  the  buffer  with  records  having  distinct 
keys  if  that  is  possible,  since  the  buffer’s  contents  are 
arbitrarily  permuted  during  the  series-merging  phase. 
Correspondingly,  we  must  provide  for  the  special  case  in 
which  there  are  not  enough  distinct  keys  to  fill  the  buffer. 
We  also  want  to  make  the  BLOCKSORT  subprogram  of 
the  block-sorting  phase  stable,  because  otherwise  a  large 
collection  of  homogeneous  blocks  may  be  unpredictably 
rearranged.  Finally,  as  with  the  fundamental  unstable 
merge  8  we  need  to  specify  implementation  details  for 
handling  lists  and  sublists  of  arbitrary  sizes. 

4.3  The  Main  Idea 

Since  the  possibility  of  troublesome  homogeneous 
blocks  prevents  the  use  of  any  simple  scheme  lor 
identifying  individual  blocks  as  to  their  origin,  we  shall 
seek  instead  to  devise  a  strategy  by  which  we  can 
distinguish  a  series  of  consecutive  blocks  from  the  hrst 
sublist  from  a  series  of  consecutive  blocks  from  *“e 
second.  To  this  end,  it  is  enough  if  we  can  be  sure  ot  the 
first  and  last  block  in  every  series  from  the  second  sublist 

0I1We  shall  encode  this  information  in  L  during  the 
(stable)  block-sorting  phase  and  decode  it  during  the 
series-merging  phase.  To  encode,  we  use  two  memory 
cells,  one  to  point  to  the  (post-sorting)  position  of  the 
leftmost  block  of  the  leftmost  second-subhst  senes  and 
one  to  point  to  the  (post-sorting)  position  of  the  nghtmost 
block  of  the  rightmost  second-sublist  series.  As  the  sort 
phase  progresses,  we  mark  each  series  by  exchanging  the 
head  of  the  rightmost  block  of  one  second-sublist  senes 
with  the  tail  of  the  leftmost  block  of  the  next  second- 

sublist  series.  U1  .  c 

We  thus  make  use  of  the  fact  that  one  or  more  blocks 

from  the  first  sublist  must  lie  between  the  blocks  we  have 
modified,  insuring  that  we  can  correctly  decode  the  series 
delimiters  during  the  series-merging  phase.  That  is  it 
directly  follows  that  the  head  of  the  nghtmost  block  of  a 
second-sublist  series  will  temporarily  have  a  key  stnctly 
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greater  than  that  of  the  record  to  its  immediate  right, 
while  the  tail  of  the  leftmost  block  of  a  second-sublist 
series  will  temporarily  have  a  key  strictly  less  than  that  of 
the  tail  of  the  block  to  its  immediate  left.  Of  course,  with 
this  stringent  mechanism  for  defining  each  distinct  series 
to  be  merged,  we  do  not  employ  the  simpler  criteria  used 
in  the  unstable  merge  to  locate  series.  Now,  as  we  merge, 
we  undo  die  exchanges  that  delimit  the  series  and  always 
break  ties  in  favour  of  the  series  from  the  first  sublist. 
0(V«)time  and  0(1)  space  arc  sufficient  for  this  scheme. 

4.4  Other  Relevant  Details 

We  attempt  to  load  the  internal  buffer  with  distinct- 
keyed  records  as  follows.  We  begin  at  the  right  end  of  the 
first  sublist  and  scan  to  the  left.  When  a  comparison  of 
adjacent  keys  reveals  that  the  leftmost  copy  of  a  key  has 
been  found,  that  record  is  coalesced  into  the  buffer. 
Other  records  are  exchanged  with  the  rightmost  current 
buffer  element.  Therefore,  the  buffer  begins  with  size  zero 
and  grows  as  we  ‘roll*  it  to  the  left.  When  it  has  attained 
size  V*,  we  invoke  ROTATE  to  left-justify  it.  At  the  end 
of  the  series-merging  phase  (the  buffer  is  now  right- 
justified),  we  stably  merge  the  buffer  with  the  remainder 
of  the  list  with  a  backward  BLOC  EMERGE  using 
leftmost  insertion  points. 

In  the  event  that  we  exhaust  the  first  sublist  without 
filling  the  internal  buffer,  we  must  employ  fewer  but 
larger  blocks.  Specifically,  if  we  obtain  only  *  <  \Jn 
buffer  elements,  then  we  use  s  blocks,  each  of  size  at  most 
\n/s].  Although  this  permits  the  use  of  the  stable 
BLOCESORT  described  below,  it  is  of  no  help  in  the 
merging  phase.  Fortunately,  however,  such  a  small 
number  of  distinct  keys  in  the  first  sublist  ensures  that  we 
can,  in  0(n)  time,  stably  merge  the  sorted  series  of  blocks 
with  a  left-to-right  series  of  backward  BLOC  EMERGE 
operations,  each  using  the  proper  insertion  point,  which 
is  the  leftmost  if  the  left  series  is  from  the  second  sublist, 
and  the  rightmost  otherwise.  Since  the  buffer  does  not  in 
this  scheme  end  up  adjacent  to  the  unmerged  suffix  of  the 
right  series  when  the  left  is  exhausted,  wc  use  a  pointer  to 
indicate  this  boundary,  namely,  the  location  of  the 
leftmost  record  in  the  right  series  not  moved  by  ROTATE. 
(Although  this  method  is  easy  to  implement,  it  is  not 
perhaps  obvious  that  it  takes  only  linear  time.  See 
Section  4.5).  We  then  stably  merge  the  buffer  with  the 
remainder  of  the  list  with  a  forward  BLOC EM  ERG  E 
using  leftmost  insertion  points. 

Our  BLOCESORT  implementation  must  be  stable. 
This  is  easily  achieved  by  first  invoking  SORT  on  the 
buffer  and  then  using  it  to  ‘remember’  the  original  block 
sequence.  That  is,  we  exchange  each  buffer  element  with 
the  proper  block's  tail  before  blocks  are  rearranged,  and 
then  undo  each  exchange  as  the  corresponding  block  is 
selected  by  BLOCESORT.  This  simple  scheme,  a 
variation  of  the  ‘segment  insertion  process’  used  in  Ref. 
23,  thus  restores  the  tails  in  time  to  perform  the  series¬ 
encoding  task  (as  described  in  Section  4.3)  as  the  sort 
progresses. 

Finally,  for  lists  and  sublists  of  arbitrary  sizes,  we 
employ  aNmethod  analogous  to  the  one  we  used  for 
unstable  merging.9  This  gives  potential  rise  to  one  small 
block  (of  size  less  than  [VwJ)  at  the  extreme  right  end  of 
the  list,  and  one  at  the  left  end  to  the  immediate  right  of 
the  buffer.  For  the  right  block,  no  modification  is 


fit 

s 

necessary.  For  the  left  one,  we  observe  that  in  general  a  | 
ROTATE  may  be  necessary  to  insert  the  block  in  its  * 
proper  place,  after  BLOCESORT  is  finished,  when  a  , 
second-sublist  series  should  precede  it.  ft 

4.5  Constant  of  Proportionality  Bounds 

In  an  effort  to  measure  the  practical  potential  of  this 
stable,  optimal  time  and  space  merge,  we  shall  study  the 
number  of  key  comparisons  and  record  exchanges  it 
demands.  These  two  primitives  are  generally  regarded  as 
by  far  the  most  time  consuming  operations  for  internal 
file  processing,  requiring  storage-to-storage  instructions 
for  many  architectures.  Since  it  is  possible  to  count  them 
independently  from  the  code  of  any  particular  im¬ 
plementation,  their  total  gives  a  meaningful  estimate  of 
the  size  of  the  linear-time  constant  of  proportionality  for 
the  algorithm  we  have  devised.  (As  for  the  issue  of 
constant  extra  space,  a  careful  review  of.  our  method 
reveals  that  a  couple  of  dozen  additional  storage  cells  is 
all  we  need  for  use  as  pointers  and  counters.) 

We  now  proceed  to  derive  a  worst-case  bound  on  the 
key-comparison  and  record  exchange  sum.  For  sim¬ 
plicity,  we  allow  for  a  (possibly  unrealizable)  worst-case 
scenario,  implying  that  the  figures  we  produce  may  be 
rather  conservative  upper  bounds.  (This  is  offset  to  some 
extent,  especially  for  small  imputs,  by  the  fact  that  we 
are  ignoring  operations  bounded  above  by  lower-order 
terms.  For  example,  sorting  the  buffer  can  be  ac¬ 
complished  in-place  with  heap-sort  in  0(V^\ogn)  time. 

In  fact,  our  main  idea  for  achieving  stability  needs  only 
0{y/n)  time.)  Let  nx(nj  denote  the  size  of  the  first 
(second)  sublist,  and  thus  nx+n2  =  n. 

Consider  the  general  case,  in  which  there  are  plenty  of 
distinct  keys  to  fill  the  buffer.  Extracting  the  buffer  uses 
at  most  nl  comparisons  and  nx  exchanges.  By  selecting 
blocks  from  right  to  left,  our  stable  BLOCESORT 
requires  at  most  <nj 2  comparisons  (the  first 

sublist  does  not  become  disordered)  and  fewer  than  n 
exchanges  (each  of  the  \/n—\  BLOCESWAP  invo¬ 
cations  puts  \/n  records  in  position).  For  arbitrary  list  and 
sublist  sizes,  the  ROTATE  used  to  move  the  left  small 
block  needs  at  most  n2  exchanges.  For  the  series-merging 
phase,  fewer  than  n  comparisons  and  n  exchanges  suffice. 
Finally,  since  we  use  a  binary  search  to  locate  the 
insertion  points  for  merging  back  the  buffer,  this 
operation  needs  only  0(y/n log n)  comparisons  and  no 
more  than  (n  -  V«)  +  SfiTj  /  <  L5  n  exchanges.  Therefore, 
we  are  guaranteed  a  worst-case  key-comparison  and 
record-exchange  grand  total  of  something  less  than 

6.5  n. 

For  the  special  case  in  which  we  exhaust  the  first 
sublist  before  filling  the  buffer,  the  only  operation  whose 
constant  is  affected  is  the  series-merge,  which  is  im¬ 
plemented  with  a  series  of  BLOCEMERGE  invocations. 

In  this  simple  scheme,  we  work  from  left  to  right,  always 
merging  a  series  of  one  or  more  blocks  with  the  single 
block  to  its  immediate  right.  Since  there  are  only  s  <  \/n 
distinct  keys  in  the  first  sublist,  we  shall  in  this  case 
employ  two  binary  searches  to  locate  insertion  points  for 
merging  (the  first  search  on  the  series  or  block  from  the 
first  sublist,  the  second  search  on  the  other)  and  require 
only  OU'rtlogw)  comparisons.  As  for  exchanges,  we 
observe  that  no  record  is  moved  to  the  right  more  than 
once.  Each  distinct  key  in  the  first  sublist  gives  rise  to  at 
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most  one  invocation  of  ROTATE,  except  when  such  a 
key  is  represented  in  two  distinct  tirst-sublist  series  (1 
cannot  te  in  three  or  more),  which  can  happen  at  most 
sft  times,  each  time  giving  nse  to  at  most  one  more 
ROTATE  operation.  Since  each  ROTATE  moves 
most  n/s  records  to  the  left,  a  total  of  at  mo^t 
n+1  5s(n/s)  =  2.5n  exchanges  are  required.  Hence,  we 
are  assured  a  worst-case  key-comparison  and  record- 
exchange  grand  total  bounded  above  by  In 

For  comparison,  consider  previously-published 
methods  to  solve  this  problem.2'23  Cur.ous ly.  these 
works  focus  only  on  establishing  the  existence  of 
algorithms,  and  include  no  constant  of  proportionall  y 
analysis.  However,  we  have  studied  the  intricate  details 
of  the  general  method  they  use,  as  described  in  full  in, 
and  have  found  that  they  yield  a  worst-case  key- 
comparison  and  record-exchange  grand  total  in  excess  o 
I5„  Perhaps  more  importantly,  we  observe  that  our 
approach  is  dramatically  simpler.  As  one  rough  est.male 
of  the  cost  of  stability,  we  remark  that  the  ^  key- 
comparison  and  record-exchange  total  or  our  underlying, 
unstable  merge  was  bounded  above  by  3.5 n  in  Rer  9. 

5.  STABLE  IN-PLACE  SORTING 
5.1  The  Direct  Mcrgc-Sort  Approach 

We  can,  naturally,  now  take  the  simple  course 
suggested  in  Ref.  23  and  directly  use  our  stable,  m-place 
linear-time  merge  as  a  subroutine  for  merge-sorting.  We 
observe  that  this  gives  rise  to  a  key-comparison  and 
record-exchange  total  bounded  above  by  7  nlogjn,  plus 
lower  order  terms  (elementary  combinatorics  guarantees 
that  our  merge's  sublinear  terms,  the  largest  o  w  ic  i 
O(Vnlogn),  give  rise  to  terms  of  at  most  0(n)  in  the 
resulting  'divide  and  conquer’  merge-sort  scheme). 

As  with  the  traditional,  memory-dependent  merge- 
sort,  this  scheme  will  be  more  effective  in  practice  if  we 
use  a  less-complicated,  quadratic-time  sort  when  subfile 
sizes  fall  below  some  established  'break-even  point  tha 
depends  on  a  number  of  factors  local  to  a  given  sor  mg 
environment.  Even  so.  a  lot  of  time  will  be  spent  in 
extracting  an  internal  bulTcr  at  each  call  of  the  merge 
subroutine.  We  shall  demonstrate  in  the  next  subsection 
that  this  effort  can  be  avoided  as  long  as  no  single  key  is 
permitted  to  dominate  the  file. 

5.2  A  Nontrivial  Sort-bv-Merging  Strategy 

Consider  an  environment  in  which  no  key  is  duplicated 
more  than  about  V"  times,  which  may  be  plausible  in 
many  settings.  With  this  restriction,  we  now  outline  a 
method  to  sort  stably  by  merging  so  as  to  bypass  much 
of  the  overhead  involved  in  a  direct  merge-sorting 
scheme.  (Nevertheless,  without  this  restriction  we  must 
endorse  instead  the  direct  merge-sort  approach.  That  is, 
we  have  found  no  general  mechanism  by  which  our 
nontrivial  merge-sort  described  below  can  stably  handle 
large  numbers  of  homogeneous  blocks  in  its  later  passes 
without  overstepping  our  professed  goal  of  presenting 
relatively  simple  and  practical  algorithms.) 

To  facilitate  discussion,  suppose  that  n  is  of  the  form 
2ik+2*  for  some  positive  integer  k.  We  assume  that  no 
single  key  is  represented  more  than  2‘:  Times.  Conse¬ 
quently.  we  can  use  blocks  of  size  2*  since  there  are  more 
than  2‘  distinct  keys  available  for  the  buffer.  (.  o 


laborious  discussion  of  implementation  details  is  necess¬ 
ary.  If  n  is  not  of  the  proper  form,  we  merely  determine 
the  value  of  k  for  which  22*  +  2  <  n  <  2  +2  .Our 

restriction  becomes  that  no  single  key  ts  represented 
more  than  2*"'  times,  insuring  blocks  of  size  l  , 
which  will  do.)  Figure  la)  depicts  such  a  list  with  k-2 
and  n  =  20.  Only  record  keys  are  listed,  denoted  by 
capital  letters.  Subscripts  are  included  to  keep  track  of 
duplicate  keys  as  the  algorithm  progresses 

The  first  step  of  the  algorithm  is  to  fill  an  internal 
buffer  of  size  V  with  records  having  distinct  keys.  Thus 
«  .0  convert  L  into  .he  form 

buffer  and  A  =  L-B  such  that  STABLESORTiL) 

"  ST  A  BLEMERGE(B,  STA  BLESO  RT{A)).Todo  this, 
we  perform  a  left-to-right  scan  of  L,  growing  B  as  a 
sorted  sublist.  The  first  record  of  L  is  placed  in  B  As  we 
scan  the  ith  record,  /  >  1,  we  conduct  a  binary  search  on 
B  to  see  if  its  key  is  already  present.  If  so,  we  go  on  to 
scan  Z  next  record.  If  not,  we  ROTATED  appropriate 
segment  of  L  so  that  B' s  rightmost  record  occupy 
position  i- 1.  We  then  insert  the  new  record  into <  B.  As 
soon  as  B  is  filled,  we  invoke  ROTATE  to  make  B  a 
prefix  of  L.  Fig.  1  illustrates  how  this  process  modifies 
our  example  list  of  20  elements.  We  have  used  only  0(1) 
extra  space,  0(n)  exchanges  and  0(n log2*  =  nk)  com- 
pari  sons. 

G,  E,  Gj  B,  E,  E3  D,  D,  A,  C,  A,  E4  H,  C,  D3  H,  B,  F,  A,  C3 
a)  Example  list  L,  with  A:  =  2  and  »  =  20. 

EjGj,  G,  B,  Ej  E3  D,  Dj  A,  C,  Aj  E4  Hi  Cj  D3  Hj  Bj  F,  As  C3 
B 

b)  First  two  buffer  elements  are  found. 

G3  BiEiCfi  E3  E3  D,  D,  A,  C,  A,  E.  H,  C,  D,  H,  B,  F,  As  C, 

B 

c)  Third  buffer  element  is  found. 

G2  E2  E3  BiDiEiGi  D2  Ai  Ci  A2  E<  Hi  C2  D3  H2  B2  Fi  A3  C3 

B 

d)  Buffer  is  filled. 

BiDiEiG,  G2E3E3D3A1C,A3E4H1C2PsH,B3FiA3C3 


e)  Buffer  is  repositioned. 

Figure  1.  Filling  the  internal  buffer,  B. 

In  the  second  step  of  the  algorithm,  we  use  fi  t 

conduct  the  first  k+\  passes  of  a  merging  sortWef.^ 
employ  the  rightmost  buffer  element  to  conduct  a  1 
right  pass  of  A ,  producing  a  sequence  of  sorted  tw 
rlord  sublists  as  we  go.  The  second  mergmgpassis  don 
with  the  rightmost  two  remaining  buffer  e'ements,  t 
time  producing  sorted  four-element  sublists,  and  so 
The  size  of  B  ensures  that  this  simple  strategy  suffices  fc 
k  passes,  each  doubling  the  length  of  the  sorted  suto 
(Since  2"  =  2?  ,  2<_1  + 1,  pass  i  is  performed  with  2 
buffer  elements  for  \*Zi<k,  but  in  pass 
2‘-*  +  1  elements,  one  more  than  we  really  need.)  No 
that  B  is  reassembled  as  a  suffix  of  L,  we  proceed  to  u 
it  in  a  right-to-left  fashion  to  perform  merging  pass  Ac  + 
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Therefore  BA  has  been  transformed  into  BC ,  where  C 
contains  2k~l  sorted  sublists,  each  of  size  2*'*‘.  See  Figure 
2.  No  more  than  0(1)  space  and  0{nk )  time  has  been 
used. 


EjDiBjGi  A1A2C1D2  E2E3E4G2  A3B2C2C3  D3FiHiH2 
B 

a)  Example  list  after  pass  k  +  1  =  3. 


B^PtEj Gi  Gj  E^  E3  D2  Ai  Ci  Aj  E4  Hj  C2  D3  H2  B2  Fi  A3  C3 
B 

a)  Example  list  after  buffer  is  filled. 


Bi  Di  Ei  E2G2  D2E3  A1C1  A2E4  C2H1  D3H2  B2F1  A3C3  Gi 

b)  First  pass  is  performed. 

DjEjEsGj  A1A2C1E4  C2D3H1H2  A3B3C3F1  BiD|EiG^ 

c)  Second  pass  is  performed. 

E1D1B1G1  Aj  A2C1D2E2E3E4G2A3B2C2C3D3F1H1H2 

- c - - 

d)  Pass  k  +  1  =  3  is  performed. 

Figure  2.  The  first  k  +  1  merging  passes. 


For  the  third  step  of  the  algorithm,  it  is  helpful  to 
think  of  C  as  a  collection  of  2*  sorted  blocks,  each  of  size 
2k.  In  pass  k  +  2,  we  use  B  to  obtain  2*”*  sublists,  each  of 
size  2*+2  as  follows.  Let  X  and  Y  denote  a  pair  of  sublists 
in  C  to  be  merged.  We  first  locate  the  block  of  X  whose 
head  contains  the  smallest  key  in  X .  Let  Xl  denote  this 
block.  Let  T,  denote  the  corresponding  block  of  Y. 
(Note:  We  will  search  X  and  Y  for  these  and  all 
remaining  merging  blocks  as  they  are  needed.  Although 
blocks  will  always  be  sorted  internally,  they  will  in 
general  become  unordered  within  a  sublist  with  respect 
to  each  other.  This  turns  out  to  be  advantageous  in  the 
long  run,  requiring  but  a  single  BLOCKSORT  after  the 
final  pass  rather  than  a  series  of  BLOCKSORT s  at  each 
pass  that  would  result  in  a  great  deal  of  unnecessary 
record  movement.) 

Xt  and  T,  are  now  merged  into  the  buffer’s  block  until 
it  is  filled.  Whenever  a  block  is  filled,  we  must  determine 
the  next  block  to  fill,  as  follows.  If  the  buffer  is  now 
contained  within  one  block,  then  that  block  is  filled  next. 
Otherwise,  the  buffer  must  be  split  into  two  pieces,  one  in 
a  block  of  X  and  the  other  in  a  block  of  Y.  If  one  piece 
is  a  suffix  for  its  block  (both  cannot  be),  then  we  resume 
the  merge  at  that  block.  If  not,  then  each  piece  must  be 
a  prefix  for  its  block,  and  we  resume  the  merge  at  the 
block  with  the  smaller  tail,  ties  broken  in  favour  of  the  X 
block.  After  all  blocks  of  X  and  Y  are  merged  in  this 
manner,  the  buffer  (now  in  one  block)  is  moved  back  to 
its  original  position  and  we  begin  to  merge  the  next  pair 
of  sublists  in  the  same  fashion. 

This  procedure  is  repeated  in  every  subsequent  pass, 
each  time  with  half  as  many  sublists,  each  sublist  with 
twice  as  many  blocks,  until  pass  2/c,  which  is  the  last.  See 
Figure  3.  In  this  step,  we  have  used  at  most  constant 
extra  space  and  each  of  the  k  —  1  passes  needs  only  linear 
time. 


A1A2A3B2  EiDiCiD2  E2E3E4G2  B1G3C2C3  D3F1H1H2 

block  1  " 

b)  First  block  is  merged  (buffer  elements  underscored). 


A1A2A3B2  E[ D | Bj G j  E2E3E4G2  C1C2C3D2  D3F1II1II2 
block  1  ~  block  2 

c)  Second  block  is  merged  (buffer  elements  underscored). 


A1A2A3B2  D3E2E3E4  B1B1G1G2  C1C2C3D2  E1F1H1H2 

block  1  block  3  block  2 

d)  Third  block  is  merged  (buffer  elements  underscored). 


A1A2A3B2  D3E2E3E4  F1G2H1H3  C1C2C3D2  EiDiGiBi 
block  l  block  3  block  4  block  2  B 

e)  Fourth  block  is  merged. 

E1D1G1B1  D3E2E3E4  F i C1C2C3P3  A1A2A3B2 
B  block  3  block  4  block  2  block  1 

f)  Buffer  is  repositioned. 

Figure  3.  Merging  pass  k  +  2  =  2k. 


After  pass  2/c,  BC  has  been  replaced  by  BD ,  where  i 
(like  C)  contains  2*  sorted  blocks,  each  of  size  2' 
However,  we  can  now  complete  our  stable  sort  of  L  b 
stably  sorting  B ,  sorting  D  with  a  stable  BLOCKSORl 
and  stably  merging  B  with  D.  See  Figure  4.  Thus  th 
entire  strategy  runs  in  0(n\ogn)  time  and  0(1)  extr 
space. 


E1D1G1B1  D3E3E3E4  F1G2H1H2 1  Ci C2C3D3  Ai  A3A3B2 

j ;  s - ' 

a)  Example  list  after  final  merging  pass. 

BiDiEiGi  D3E2E3E4  F1G2H1H2  C|CjC3D2  Ai  A2A3B2 

B  ' - - - — ' 

D 

b)  B  is  sorted, 

B1D1E1G1  Ai  A2A3B2C1C2C3D3D3E2E3E4F1G2H1H2 
B  D 

c)  D  is  sorted  by  blocks. 

Ai  A3  A3  Bi  B2  Ci  C2  C3  Di  Dj  D3  Ei  E2  E3  E4  Fi  Gi  G2  Hi 

d)  B  and  D  are  merged. 

Figure  4.  Completing  the  sort. 
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We  observe  that  major  factors  that  make  this  scheme 
so  much  faster  than  a  direct  merge-sort  implementation 
are  these:  (1)  we  extract  the  internal  buffer  only  once,  not 
at  every  merge  operation,  (2)  we  use  the  buffer  in  a  novel 
and  very  efficient  fashion  for  passes  1  through  k  + 1  as  we 
break  it  into  advantageously- sized  pieces  and  pass  them 
across  the  file,  and  (3)  we  avoid  unnecessary  record 
movement  by  delaying  the  use  of  BLOCKSORT  until  the 
final  step. 

5.3  Constant  of  Proportionality  Bounds 

As  in  Section  4.5,  we  focus  on  key  comparisons  and 
record  exchanges,  concentrating  on  the  constant  of 
proportionality  for  the  leading  (this  time,  0{n  log  n))  time 
complexity  term. 

Filling  the  buffer  requires  no  more  than  nlog2 
y/n  =  0.5w  log*  n  comparisons  and  only  a  linear  number 
of  exchanges.  (When  n  is  not  of  the  special  form,  this  is 
the  only  step  whose  time  complexity  may  increase,  since 
a  bigger  buffer  is  used.  Even  so,  the  buffer’s  size  is  no 
more  than  doubled,  thereby  affecting  at  most  the  linear 
term.)  The  first  merging  pass  needs  at  most  n/2 
comparisons  and  n  exchanges.  In  general,  pass  /, 

1  <  x  <  fc  + 1,  needs  at  most  (n-n/21)  comparisons  and 
n  exchanges.  The  last  merging  pass,  pass  2/c,  needs  at 
most  n+n/2  +  OWn)  comparisons  ( n  to  merge,  n/2  + 
OWn)  to  search  for  the  correct  blocks)  and  n  exchanges. 
In  general,  pass  2fc+ 1  —7, 1  /c  —  1,  needs  at  most 
(n  +  n/2*)  -I-  0(y/n)  comparisons  and  n  exchanges.  There¬ 
fore,  we  can  balance  these  leading  terms,  deriving  a  cost 
for  the  merging  passes  bounded  above  by  2 kn  <rt\og2n 
comparisons  and  ntog2w  exchanges.  Linear  time  suffices 
for  the  final  merging  and  sorting  steps.  We  conclude  that 
we  are  guaranteed  a  worst-case  key-comparison  and 
record-exchange  grand  total  not  greater  than  2.5«log2«. 

This  worst-case  total  compares  favourably  with  aver - 
age-case  key-comparison  and  record-exchange  totals  for 
popular  unstable  methods:  quick-sort’s  average-case 


figure  is  a  little  more  than  1.4 « log*  n;  heap-sort’s  is 
about  2.3  nlog2*.  (These  values  are  derived  from  the 
analysis  in  Ref.  14,  where  we  count  a  single  record 
movement  at  one  third  the  cost  of  a  two-record 
exchange.) 

6.  DIRECTIONS  FOR  CONTINUED 
RESEARCH 

We  have  presented  relatively  straightforward  and 
efficient  stable  merging  and  sorting  strategies  that, 
simultaneously  optimize  both  time  and  space  (to  within 
a  constant  factor).  The  upper  bounds  we  have  derived  on 
constants  of  proportionality  are  probably  overly  pessi¬ 
mistic,  representing  extreme  and  possibly  unrealizable 
cases  hardly  representative  of  expected  behaviour.  On 
the  other  hand,  we  again  remind  the  reader  that  we  have 
brushed  aside  lower-order  terms  that  can  be  significant  in 
practice,  especially  for  small  files. 

Given  the  obvious  importance  of  merging  and  sorting, 
a  next  logical  step  along  this  general  line  of  investigation 
might  be  a  thorough  testing  of  careful  implementations 
of  these  algbrithms.  A  great  number  of  factors  would 
likely  be  relevant  to  such  an  empirical  study,  including 
the  frequency  distribution  of  keys,  the  percentage  of 
duplicate  keys  present,  the  initial  sortedness  of  files,  the 
break-even  point  at  which  less  sophisticated  schemes  for 
small  subfiles  are  used,  record  length,  computer  archi¬ 
tecture  and  so  on. 

Optimistically,  we  observe  that  even  simpler,  more 
effective  optimal  time  and  space  strategies  are  very 
possible.  Also,  the  design  and  analysis  of  time-space 
optimal  parallel  algorithms  is  a  subject  of  obvious 
importance  for  future  study.8  As  block  rearrangement 
strategies  become  more  widely  known,  we  hope  that  their 
practical  potential  for  merging,  sorting,  duplicate-key 
extraction  and  related  file-processing  operations  will 
begin  to  be  better  understood. 
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Book  Reviews 


Andrew  Harter.  Three-dimensional  Integ¬ 
rated  Circuit  Layout.  Cambridge  University 
Press.  £25.  ISBN  0  521  41630  2. 

The  commercial  use  of  three-dimensional 
integrated  circuit  (1C)  technologies  is  not  yet 
with  us  despite  the  fact  that  suitable  device 
structures  are  beginning  to  emerge.  The 
organisation  of  a  three-dimensional  IC  in¬ 
volves  the  use  of  vertical  layers  of  devices 
separated  by  insulation  planes,  and  can  be 
seen  as  a  development  of  the  use  of  the  silicon- 
on-insulate  structures  already  encountered  in 
conventional  two-dimensional  technologies. 
Potential  benefits  include  higher  packing  den¬ 
sity  and  speed.  This  book  is  not  concerned 
with  the  development  of  new  technologies  of 
this  class,  but  rather  seeks  to  address  the 
question  of  how  to  develop  an  IC  layout 
strategy  that  fully  exploits  the  potential  of  the 
structure,  given  that  it  is  available. 

The  book  represents  the  author’s  doctoral 
thesis  and  is  one  of  a  few  published  annually 
on  the  basis  of  selection  by  a  review  panel  set 
up  jointly  by  the  Conference  of  Professors  of 
Computer  Science  and  the  British  Computer 
Society.  The  panel's  aim  is  to  select  for  wider 
dissemination  British  PhD  research  theses  of 
outstanding  calibre,  both  in  content  and 
technical  significance.  The  author  presents,  as 
might  be  expected  given  the  book's  pedigree,  a 
most  readable  and  informative  view  of  the 
emergence  of  the  three-dimensional  tech¬ 
nologies,  setting  this  account  into  the  context 
of  conventional  two-dimensional  processes. 
The  presentation  of  the  many  positive  benefits 
of  these  technologies  is  balanced  by  a  thorough 
analysis  of  their  drawbacks  and  fabricational 
problems,  including  yield,  heal  dissipation 
and  electrical  parasitic  effects.  The  industry 
has  shown  an  extraordinary  ability  over  three 
decades  to  overcome  problems  of  exactly  this 
type,  and  it  seems  entirely  reasonable  to 
suppose  that  these  limitations  will  be  dealt 
with  effectively  in  time.  Harter  establishes  the 
additional  complexity  and  richness  of  con¬ 
nectivity  available  to  the  user  and  then 
considers  the  development  of  appropriate 
layout  methods.  One  of  the  results  of  his 
investigation  is  a  very  comprehensive  devel¬ 
opment  and  evaluation  of  a  novel  abutment- 
based  layout  scheme,  which  can  be  viewed  as 
a  generalisation  of  the  two-dimensional  abut¬ 
ment  system  commonly  used  in  silicon  com¬ 
pilers  and  cell-based  automatic  or  semi¬ 
automatic  design  environments.  Issues  such  as 
vertical  scaling  and  the  optimum  number  of 
layers  that  should  be  used  arc  addressed.  If. 
indeed,  the  taxing  processing  problems  intro¬ 


duced  are  eventually  overcome,  and  if  the 
ever-growing  need  for  on-chip  capacity  is  not 
met  more  effectively  by  one  of  the  other  novel 
process  developments,  this  work  will  form  a 
most  important  starting  point  for  the  practical 
use  of  three-dimensional  device  technologies. 

As  a  research  monograph  the  book  is  not, 
of  course,  an  undergraduate  text  but  can  be 
recommended  wholeheartedly  to  the  re¬ 
searcher  and  practising  engineer  working  in 
the  field.  The  style  adopted  by  the  author  also 
makes  the  book  accessible  to  the  manager 
who  is  looking  for  a  comprehensive  overview 
of  this  topic,  which  promises  much  for  future 
generations  of  advanced,  high-density  ICs. 
Very  full  referencing  is  provided  for  the  reader 
who  wants  to  take  any  particular  topic  further. 
Finally,  as  a  model  of  how  to  write  a  PhD 
thesis,  the  book  is  exemplary;  at  once  deep, 
thorough  and  far-sighted,  the  author  does  not 
fail  to  address  the  downsides  as  well  as  the 
upsides  of  his  topic  and  presents  reflections 
that  arc  all  the  more  substantial  and  valuable 
as  a  result. 

R.  E.  Massara 
Unii'crxity  of  Essex 

F.  Brackx  and  D.  Const  ales 
Computer  Afgehra  with  LISP  and  REDUCE 
Kluwer  Academic  Dordrecht,  The  Nether¬ 
lands 

ISBN  0-7923-1441-7,  £54. 

In  recent  years  the  heavy  memory  demands  of 
computer  algebra  (CA)  systems  have  ceased  to 
be  a  serious  handicap.  The  resulting  ready 
availability  of  such  systems,  more  than  20 
years  after  their  genesis,  has  at  last  stimulated 
authors  and  publishers  to  bring  out  texts  on 
the  field.  Like  another  recent  book  of  which  I 
am  a  co-author,  this  one  deals  with  Reduce, 
which  is  one  of  the  two  CA  systems,  both 
Lisp-based  and  still  under  development,  to 
survive  from  the  1960s  (the  other  being 
Macsyma).  Reduce,  like  its  competitors,  has 
merits  and  demerits,  and  certain  unique 
features;  no  one  system  is  best  for  all  purposes. 

The  book  is  based  on  lecture  courses  given 
at  various  European  universities,  and  is  clearly 
intended  to  introduce  Reduce  to  novices.  It 
begins  with  a  brief  introduction  to  CA, 
defining  it  and  contrasting  it  with  conventional 
programming  languages,  and  includes  a  some¬ 
what  inaccurate  description  of  other  CA 
languages. 

Reduce  is  written  in  Rlisp,  which  is  es¬ 
sentially  an  extension  of  Standard  Lisp  written 
in  an  ALGOL-like  syntax.  Reduce  itself  has 


two  modes.  The  algebraic  mode,  which  is  the 
normal  one  for  direct  use,  parses  mathematical 
expressions  and  presents  a  functional  pro¬ 
gramming  paradigm.  The  symbolic  mode  is 
Lisp-like,  and  is  used  for  increased  efficiency 
in  programming  because  it  enables  more  direct 
access  to  data  structures. 

After  their  introduction,  the  authors  plunge 
straight  into  Standard  Lisp.  The  description 
contains  some  inaccuracies;  more  import¬ 
antly,  it  looks  at  Lisp  itself  from  a  misleading 
viewpoint  in  that  it  stresses  low-level  functions 
and  does  not  adequately  deal  with  Lisp’s  data 
abstractions  or  with  the  actual  ways  in  which 
Reduce  uses  Lisp  data  structures.  As  a  result 
this  part,  85  pages  long,  fails  to  connect  with 
the  later  chapters;  it  does  not  give  sufficient 
information  about  how  Reduce  works  to 
enable  a  programmer  to  begin  to  program  in 
symbolic  mode. 

The  main  part  of  the  book  (95  pages)  is  a 
description  of  algebraic- mode  Reduce  (which 
I  feel  would  be  a  more  natural  starting  point 
for  new  users  interested  in  applications).  The 
format  is  rather  like  a  reference  manual, 
defining,  describing  and  illustrating  functions 
individually,  without  much  guidance  on  how 
best  to  combine  them  or  what  difficulties  the 
user  might  encounter.  Curiously,  although  the 
book  came  out  late  enough  to  cover  the  latest 
Reduce  version,  3-4,  it  does  not  fully  exploit 
either  the  new  functions  such  as  those  for  local 
substitution  rulesets  or  the  new  packages  now 
distributed. 

Finally,  the  authors  give  50  pages  of 
examples,  covering  a  functional  iteration,  an 
algebra  of  projection  operators,  the  Grobner 
bases  package  for  polynomial  ideals,  and  the 
Gifford  algebra  arising  from  3-dimensional 
vectors.  Source  code  is  given,  except  for  the 
Grobner  bases  and  the  later  parts  of  the 
Gifford  algebra  discussion  (the  latter  being 
thereby  made  rather  pointless),  but  little  advice 
or  guidance  on  programming  appears,  and  the 
reasons  for  the  authors'  choices  of  imple¬ 
mentations  are  not  discussed.  Nor  is  their 
choice  of  subtitle  entirely  clarified,  since  there 
are  many  issues  of  interest  to  pure  mathe¬ 
maticians  that  arc  not  covered  or  even 
mentioned,  the  range  of  examples  that  do 
appear  is  restricted,  and  there  is  no  indication 
of  which  areas  of  mathematics  lend  themselves 
to  treatment  by  CA  and  which  do  not. 

As  an  author  of  a  rival  text,  I  do  not  feel 
very  threatened  by  this  competitor,  even 
though  it  contains  some  useful  remarks  and 
examples.  It  is  also  quite  expensive. 

M.  A.  H.  MacCallum 
London 
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We  present  parallel  algorithms  for  the  elementary  binary  set 
operations  that,  given  an  EREW  PRAM  with  k  processors,  oper¬ 
ate  on  two  sorted  lists  of  total  length  n  in  0(n/k  +  log  n )  time  and 
0(k )  extra  space  and  are  thus  time-space  optimal  for  any  value  of 
k  ^  n/(log  n).  Our  methods  are  stable,  require  no  information 
other  than  a  record’s  key,  and  do  not  modify  records  as  they 
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1.  INTRODUCTION 

The  design  and  analysis  of  optimal  parallel  file  rear¬ 
rangement  algorithms  have  long  been  topics  of  wide¬ 
spread  attention.  The  vast  majority  of  the  published  liter¬ 
ature  has  concentrated  on  the  search  for  algorithms  that 
are  time  optimal ,  that  is,  those  that  achieve  optimal 
speedup  (see,  for  example,  [1]).  Unfortunately,  space 
management  issues  have  often  taken  a  back  seat  in  these 
efforts,  leaving  those  who  seek  to  implement  optimal  par¬ 
allel  algorithms  unable  to  do  so  with  any  reasonable, 
bounded  number  of  processors. 

In  recent  work,  however,  parallel  merge  and  sort 
methods  that  simultaneously  optimize  both  time  and 
space  have  been  devised  [2].  Such  time-space  optimal 
algorithms  attain  optimal  speedup,  yet  require  only  a 
constant  amount  of  extra  space  per  processor,  even  when 
the  number  of  processors  is  fixed.  Just  what  scope  of  file 
rearrangement  problems  is  amenable  to  time-space  opti¬ 
mal  parallel  techniques?  In  this  paper  we  provide  a  par¬ 
tial  answer  to  this  question,  developing  time-space  opti¬ 
mal  parallel  algorithms  for  the  elementary  binary  set 

*  A  preliminary  version  of  a  portion  of  this  paper  was  presented  at 
the  International  Conference  on  Databases,  Parallel  Architectures  and 
Applications  (PARBASE-90),  held  in  Miami  Beach,  Florida,  in  March 
1990. 

t  This  research  has  been  supported  in  part  by  the  National  Science 
Foundation  under  Grant  MIP-8919312  and  by  the  Office  of  Naval  Re¬ 
search  under  Contract  N00014-90-J-1855. 


operations,  namely,  set  union,  intersection,  difference, 
and  exclusive  or. 

To  accomplish  our  goal,  we  devise  a  new  parallel  se¬ 
lect  procedure,  reducing  the  general  problem  to  one  of  a 
series  of  disjoint  local  operations,  one  for  each  proces¬ 
sor,  on  which  we  can  exploit  sequential  methods.  Given 
an  EREW  PRAM  with  k  processors,  our  algorithms  oper¬ 
ate  on  two  sorted  lists  of  total  length  n  in  0(n/k  +  log  n) 
time  and  0{k)  extra  space  and  are  thus  time-space  opti¬ 
mal  for  any  value  of  k  <  n/( log  n).  For  the  sake  of  com¬ 
plete  generality,  our  algorithms  are  stable  (records  with 
identical  keys  retain  their  original  relative  order),  do  not 
modify  records  (even  temporarily)  as  they  execute,  and 
require  no  information  other  than  a  record’s  key. 

2.  TIME-SPACE  OPTIMAL  PARALLEL  SELECT  ON  THE 
EREW  PRAM  MODEL 

Given  two  sorted  lists  L\  and  L2,  our  goal  is  to  trans¬ 
form  LI  into  two  sorted  sublists  L3  and  L4,  where  L3 
consists  of  the  records  whose  keys  are  not  found  in  L2 
and  L4  consists  of  the  records  whose  keys  are.  Thus  we 
accept  L  =  L\L2  and  select  records  from  LI  whose  keys 
are  contained  in  L2,  accumulating  them  in  L4,  where  our 
output  is  of  the  form  L3L4L2. 

Our  parallel  algorithm  comprises  four  steps:  local  se¬ 
lecting ,  series  delimiting ,  blockifying ,  and  block  rear¬ 
ranging.  To  facilitate  discussion,  we  temporarily  assume 
that  the  number  of  records  of  each  type  (LI,  L2,  L3,  and 
L4)  is  evenly  divisible  by  k ,  where  k  denotes  the  number 
of  processors  available. 

Local  Selecting.  We  first  view  L  as  a  collection  of  k 
blocks,  each  of  size  n!k ,  and  associate  a  distinct  proces¬ 
sor  with  each  block.  We  seek  to  treat  each  LI  block  LI  /  as 
if  it  were  the  only  block  in  LI,  transforming  its  contents 
into  the  form  L3/L4,- . 

Our  first  task  in  this  step  is  to  determine  where  each 
tail  (rightmost  element)  of  each  LI  block  would  go  if  the 
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tails  alone  were  to  be  merged  with  L2.  In  order  to  make 
this  determination  efficiently  on  the  EREW  model,  we 
direct  each  LI  processor  to  set  aside  four  extra  storage 
cells  (for  copies  of  indices,  offsets,  and  keys)  and  employ 
the  “phased  merge”  as  described  in  the  displacement 
computing  step  of  the  merge  in  [2].  At  most  0( log  n) 
time  and  O(k)  extra  space  have  been  consumed  up  to  this 
point. 

As  long  as  an  LI  processor  does  not  need  to  consider 
more  than  0{n/k)  L2  records  (a  quantity  known  by  con¬ 
sidering  the  difference  between  where  its  block’s  tail 
would  go  and  where  the  tail  of  the  block  to  its  immediate 
left  would  go  if  they  were  to  be  merged  with  L2),  we 
instruct  it  to  employ  the  linear-time,  in-place  sequential 
select  routine  from  [5].  Otherwise,  in  the  case  where  an 
LI  block  spans  several  L2  blocks,  we  first  enlist  the  aid  of 
the  corresponding  L2  processors  to  preprocess  their  re¬ 
cords  (performing  the  time-space  optimal  sequential  se¬ 
lect  against  the  LI  block,  followed  by  a  time-space  opti¬ 
mal  sequential  duplicate-key  extract  [4]),  then  instruct 
the  LI  processor  to  perform  its  select  (at  most  nlk  L2 
records  are  now  needed),  and  finally  direct  the  L2  proces¬ 
sors  to  restore  their  blocks  (two  time- space  optimal  se¬ 
quential  merge  operations  suffice). 

Thus,  if  we  let  h  denote  the  number  of  blocks  in  LI, 
the  LI  list  has  now  taken  on  the  form  L3iL4iL32L42 
L3hL4h.  This  completes  the  local  selecting  step  and  has 
required  0(nlk  +  log  n)  time  and  constant  extra  space  per 
processor. 

Series  Delimiting.  We  now  seek  to  divide  LI  into  a 
collection  of  nonoverlapping  “series,”  each  series  with 
nlk  L3  records.  To  begin  this  process,  we  locate  special 
records  that  we  term  “breakers,”  each  of  which  is  the 
(m(n/k)  +  l)th  L3  record  for  some  integer  m.  First  we 
compute  prefix  sums  on  |L3/|  to  find  these  breakers.  For 
example,  if  Sfj/  |L3,-|  <  m(nlk)  +  1  and  2f=i  |L3/|  ^ 
m{n!k)  +  1,  then  block  g  contains  the  rath  breaker.  We 
identify  three  special  types  of  breakers.  If  block  i  con¬ 
tains  a  breaker,  but  neither  block  i  -  1  nor  block  i  +  1 
contains  breakers,  then  the  breaker  in  block  i  is  called  a 
“lone”  breaker.  If  block  i  -  1  and  block  i  both  contain 
breakers,  and  if  block  i  +  1  does  not  contain  a  breaker, 
then  the  breaker  in  block  i  is  called  a  “trailing”  breaker. 
If  block  i  and  block  i  +  I  both  contain  breakers,  and  block 
i  -  1  does  not  contain  a  breaker,  then  the  breaker  in 
block  i  is  called  a  “leading”  breaker. 

These  breakers  are  used  to  divide  LI  into  nonoverlap¬ 
ping  series  as  follows:  each  series  begins  with  a  lone  or 
trailing  breaker  and  ends  with  the  record  immediately 
preceding  the  next  lone  or  leading  breaker.  By  design, 
each  series  contains  exactly  nlk  L3  records.  A  sample 
series  is  depicted  in  Fig.  1,  where  we  use  L3/  to  denote 
L3/ minus  any  records  that  precede  its  breaker  and  L3g+\ 


■  •  •  L3j  L4:f  L3/+iL4/+i  •  •  •  L35_iL45_i  LZgLAg  L3g^  •  •  • 

s  ■  -y  s 

one  series 

FIG.  1.  A  sample  series  obtained  in  the  series  delimiting  step. 

to  denote  L3g+\  minus  its  breaker  and  any  records  that 
follow  it. 

A  processor  that  holds  a  lone  or  trailing  breaker  broad¬ 
casts  its  breaker’s  location  to  its  right.  After  that,  a  pro¬ 
cessor  that  holds  a  lone  or  leading  breaker  broadcasts  its 
breaker’s  location  to  its  left.  (This  type  of  broadcasting 
can  be  efficiently  accomplished  on  the  EREW  PRAM 
with  data  distribution  algorithms  or  parallel  prefix  com¬ 
putations.)  By  this  means,  a  processor  learns  the  location 
of  the  lone  or  trailing  breaker  to  its  immediate  left  and  the 
location  of  the  lone  or  leading  breaker  to  its  immediate 
right.  This  completes  the  series  delimiting  step  and  has 
required  0(\og{nlk)  +  log  k)  time  and  constant  extra 
space  per  processor. 

Blockifying.  In  this  step,  we  first  reorganize  in  paral¬ 
lel  the  LI  records  within  every  series  and  then  reorganize 
in  parallel  the  records  in  the  remainder  of  the  LI  list. 

Let  us  consider  our  sample  series  as  depicted  in  Fig.  1. 
We  seek  to  collect  the  nlk  L3  records  in  this  series  in 
block  g  (and  thus  move  the  L4  records  into  the  other 
blocks  and  subblocks  illustrated).  It  is  a  simple  matter  to 
exchange  L3g+i  with  the  rightmost  \L3g+i\  records  in 
L4g.  Efficiently  coalescing  the  other  L3  records  into 
block  g  is  much  more  difficult.  We  begin  by  computing 
prefix  sums  on  |L3/ 1,  |L3/+i|,  ...,  |L3^_2|,  \L3g-\\  to  ob¬ 
tain  a  “displacement  table.”  Table  entry  Et  =  2*=/|L3*| 
denotes  the  number  of  L3  records  in  blocks  indexed  / 
through  i  that  are  to  move  to  block  g.  It  turns  out  that  L, 
will  also  denote  the  number  of  L4  records  that  block  i  is 
to  receive  from  block  /  +  1  as  our  algorithm  proceeds.  In 
Fig.  2,  our  sample  series  is  shown  in  more  detail  (with  g 

breaker  breaker 

'T'  1  1  3  3  4  ,  6  ,  4  4  5  7  7,  11  13,  7  8  8  9  14  15,  10  10  12  12  . 7(7  •  -  •  •  • 

LZf  L4f  L4/+ 1  £3/+2  L4j+2  L4j+ 3  £3/+« 

/  /+1  /+ 2  /+ 3  /+ 4 

Sample  Series 


i 

Ei 

f 

1 

f+1 

2 

f+2 

4 

Displacement  Table 

FIG.  2.  A  more  detailed  view  of  a  sample  series  and  its  displace¬ 
ment  table. 
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set  at  /  +  3)  along  with  its  corresponding  displacement 
table. 

Thus  each  processor  i,  f  <  i  <  g,  now  uses  the  dis¬ 
placement  table  to  determine  exactly  how  the  records  in 
its  block  are  to  be  rearranged:  it  is  to  send  | L3,j  records  to 
block  g ,  send  its  first  Et-  \  L4  records  (which  we  denote  by 
Xd  to  block  i  -  1,  retain  its  next  nlk  -  |L3/|  -  Et-\  LA 
records  (denoted  by  Y{),  and  receive  E{  LA  records  (de¬ 
noted  by  Xi+i)  from  block  i  +  1.  Processors  / and  g  deter¬ 
mine  similar  information:  processor /is  to  send  \Lbf\  = 
Ef  records  to  block  g  and  receive  the  same  number  of 
records  from  block  /  +  1,  and  processor  g  is  to  send 
\LAg\  =  Eg- 1  records  to  block  g  -  1  and  receive  the  same 
number  of  records  from  blocks  /  through  g  -  1.  (Note 
that  segments  A/ and  Yg  are  empty.) 

To  accomplish  the  data  movement,  each  processor 
first  reverses  the  contents  of  its  block  and  then  reverses 
its  X ,  Y,  and  Lb  segments  separately,  thereby  efficiently 
permuting  its  (two  or)  three  subblocks.  Each  processor  j, 
f<  j  <  g ,  now  employs  a  single  extra  storage  cell  to  copy 
safely  the  first  record  of  Xj  to  the  location  formerly  occu¬ 
pied  by  the  first  record  of  Xj-i,  while  processor  /  copies 
the  first  record  of  its  Lb  segment  to  the  location  formerly 
occupied  by  the  first  record  of  Xg .  Data  movement  con¬ 
tinues  in  this  fashion,  with  each  processor  moving  its  Lb 
records  to  block  g  as  soon  as  its  X  segment  is  exhausted. 

Note  that  if  k  is  small  enough  (no  greater  than 
0(max{n/k,  log  «})),  then  the  displacement  table  can 
merely  be  searched;  if  k  is  larger  than  this,  then  the  table 
may  contain  too  many  identical  entries,  and  we  invoke  a 
preprocessing  routine  to  condense  it  (again  with  the  aid 
of  broadcasting). 

Yj  X}+ 1  V/+ 1  Xj+iYf+2  XJ+z 

2  TT3H'  6  'T'OtTr  JiJjL  .14  15,;  10  1/12  12; 

L3^  L4j  L3j+i  L4f+i  L3/+2  L4J+2  LZ/+3  L4/+3 

/  /+ 1  /+ 2  /+3 

Sample  Series 

y}  y>+ 1  X}+ 1  y>+2  xJ+2  xJ+i 

TT3T4  2  TlPTP^P  ^  lT?T8  ,10  10  12  12,  JAJL 

L4j  LZf  L4/+i  L3/+i  L4f+2  LZf+i  L4}+ 3 _ LZ}+3  ^ 

f  /+ 1  /+2  /+ 3 

Subblock  Permutations 

Yj  XJ+1  y/+i  Xf+2  Yj+2  Xf+ 3  L3j  L3jf+!  L3jr+2^  L3 }j3 

TT3T4  TTYrTil  10  m2 12/  ^2"  'T'  TnP  14 15 , 

/  j+ 1 _ i+ 2 _ ^  _ j +1 _ ^ 

WocJt»  of  L4  records  block  of  LZ  records 

Data  Movement 

FIG.  3.  Coalescing  the  L3  records  of  one  series  into  a  single  block. 


After  the  data  movement  is  finished,  it  is  necessary  to 
rotate  Lbg  with  the  records  moved  into  block  g  from 
block  g  +  1.  The  processing  of  the  series  is  now  com¬ 
pleted,  as  depicted  in  Fig.  3. 

If  block  g  +  1  contains  a  leading  breaker,  we  next 
rotate  the  records  in  an  appropriate  prefix  of  this  block  to 
ensure  that  Lb  records  precede  LA  records  there. 

We  can  now  handle  the  records  not  spanned  by  a  se¬ 
ries.  These  records  are  contained  in  zero  or  more  non¬ 
overlapping  “sequences”  (we  choose  this  term  to  avoid 
confusion  with  “series”),  where  each  sequence  begins 
with  a  leading  breaker  and  ends  with  the  record  immedi¬ 
ately  preceding  the  next  trailing  breaker.  Suppose  such  a 
sequence  spans  p  blocks.  Because  there  are  exactly  p 
breakers  in  these  blocks  and  because  the  Lb  records  be¬ 
fore  the  first  breaker  and  after  the  last  breaker  have 
been  moved  outside  these  blocks,  there  are  now  exactly 
(p  -  1  ){n!k)  Lb  records  there.  Thus,  there  are  exactly 
nlk  LA  records  there. 

If  p  =  2,  then  the  two  blocks  have  the  form 
LbiLAiLbi+lLAi+u  where  \LAt\  =  \Lbi+]\.  Swapping  LA{ 
with  Lbi+]  finishes  the  blockifying  for  this  sequence.  If 
p  >  2,  then  we  simply  treat  the  sequence  as  we  earlier  did 
each  series,  exchanging  the  roles  of  Lb  and  LA  records. 
This  completes  the  blockifying  step  and  has  required 
0(nlk  +  log  n )  time  and  constant  extra  space  per  proces¬ 
sor. 

Block  Rearranging.  Lb  has  now  become  an  ordered 
collection  of  blocks  interspersed  with  another  ordered 
collection  that  constitutes  LA.  We  need  only  to  rearrange 
these  blocks  so  that  Lb  is  followed  by  LA.  We  direct  each 
processor  to  set  aside  a  zero  bit  if  it  contains  an  Lb  block 
and  to  set  aside  a  one  bit  otherwise.  The  processors  now 
need  only  compute  prefix  sums  on  these  values  and  then 
acquire  their  respective  new  blocks  in  parallel  without 
memory  conflicts.  This  completes  the  block  rearranging 
step  and  has  required  0(nlk  +  log  k)  time  and  constant 
extra  space  per  processor. 

3.  IMPLEMENTATION  DETAILS 

Suppose  that  the  number  of  Lb  (and  hence  LA)  records 
is  not  evenly  divisible  by  k ,  in  which  case  the  last  breaker 
begins  a  series  with  strictly  fewer  than  nlk  Lb  records. 
This  series  is  treated  just  as  any  other  (although  it  may  be 
a  very  short  one,  lying  entirely  within  the  last  block).  In 
blockifying,  the  Lb  records  in  this  series  are  collected  in 
the  last  block,  so  that  this  series  becomes  a  (possibly 
empty)  collection  of  LA  blocks  followed  by  (possibly  just 
part  of)  one  block  containing  both  L3  and  LA  records. 
After  the  block  rearranging  step,  we  need  only  move  the 
Lb  and  LA  segments  from  the  last  block  into  their  appro¬ 
priate  final  positions  with  parallel  rotations. 
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More  generally,  suppose  that  the  number  of  LI  rec¬ 
ords  is  not  evenly  divisible  by  k.  (Note  that  the  number  of 
L2  records  never  really  needs  to  be  evenly  divisible  by  k.) 
We  transform  L\L2  into  the  list  L1!L12L2,  where  LI1 
contains  an  integral  multiple  of  nlk  records  and  where 
LI2  contains  strictly  fewer  than  nlk  records.  We  further 
transform  the  input  into  the  list  LVL2L12  by  means  of 
parallel  rotations  and  invoke  the  main  algorithm  on 
LVL2 ,  yielding  L31L41L2L12.  Then  a  local  select  of  LI2 
against  L2  gives  L3IL4IL2L32L42.  Parallel  rotations  now 
produce  the  desired  result,  L3,L32L41L42L2  =  L3L4L2. 

The  time  and  space  requirements  of  these  implementa¬ 
tion  details  are  thus  bounded  by  those  of  the  main  parallel 
algorithm.  This  completes  the  description  of  our  parallel 
method.  In  summary,  the  total  time  spent  is  0(n/k  +  log 
n)  and  the  total  extra  space  used  is  0{k).  Therefore,  this 
select  algorithm  is  time-space  optimal  for  any  value  of 
k  <  n/(log  n),  thereby  meeting  our  stated  goal. 

4.  TIME-SPACE  OPTIMAL  PARALLEL  SET  OPERATIONS 

In  what  follows,  suppose  we  are  given  the  input  list 
L  -  XY ,  where  X  and  Y  are  two  sublists,  each  sorted  on 
the  key  and  each  containing  no  duplicates.  Since  the 
same  key  may  naturally  appear  once  in  X  and  once  in  Y , 
we  insist  that,  in  the  spirit  of  stability,  the  record  repre¬ 
sented  in  the  result  of  a  binary  set  operation  be  the  one 
that  occurs  first  in  L. 

We  now  have  stable,  time-space  optimal  parallel  sub¬ 
routines  sufficient  to  perform  the  elementary  binary  set 
operations.  Select  is  obtained  from  the  work  of  the  last 
two  sections.  Merge  is  obtained  from  [2].  Duplicate-key 
extract  is  obtained  from  an  easy  modification  to  select,  in 
which  we  replace  the  first  step,  local  selecting,  with  the 
local  duplicate-key  extracting  method  of  [4].  (Local  du¬ 
plicate-key  extract  is  actually  easier  than  local  select, 
because  the  LI  processors  need  no  information  from  the 
L2  list.) 

We  invoke  merge  followed  by  duplicate-key  extract  to 
produce  XU  F.  We  perform  select  to  yield  both  X  D  Y 
and  X  -  Y.  To  achieve  X  ©  F,  we  invoke  select  on  XY, 
producing  XiX2F;  rotate  X2  and  Y  to  yield  XiFX2;  per¬ 
form  select  on  FX2,  producing  XiFjF2X2;  and  finally 
merge  X\  and  Y{ . 

5.  CONCLUDING  REMARKS 

Assuming  only  the  weak  EREW  PRAM  model,  we 
have  presented  for  the  first  time  parallel  algorithms  for 


the  elementary  binary  set  operations  that  are  asymptoti¬ 
cally  time-space  optimal.  As  a  bonus,  these  methods  im¬ 
mediately  extend  to  multisets  (under  several  natural  defi¬ 
nitions  [5]).  Although  n  must  be  large  enough  to  satisfy 
the  inequality  k  <  n/( log  n)  for  optimality,  we  observe 
that  our  algorithms  are  also  efficient  in  the  usual  sense 
(their  speedup  is  within  a  polylogarithmic  factor  of  opti¬ 
mal)  for  any  value  of  n,  suggesting  that  they  may  have 
practical  merit  even  for  relatively  small  files.  As  long  as 
main  memory  remains  a  critical  resource  in  many  envi¬ 
ronments,  the  quest  for  techniques  that  permit  the  effi¬ 
cient  use  of  both  time  and  space  continues  to  be  a  fertile 
research  domain. 
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ON  WELL-PARTIAL-ORDER  THEORY  AND  ITS  APPLICATION  TO 
COMBINATORIAL  PROBLEMS  OF  VLSI  DESIGN* 

MICHAEL  R.  .FELLOWS*  AND  MICHAEL  A.  LANGSTON* 

Abstract.  The  existence  of  decision  algorithms  with  low-degree  polynomial  running  times  for  a 
number  of  well-studied  graph  layout,  placement,  and  routing  problems  is  nonconstructively  proved. 
Some  were  not  previously  known  to  be  in  V  at  all;  others  were  only  known  to  be  in  V  by  way  of 
brute  force  or  dynamic  programming  formulations  with  unboundedly  high-degree  polynomial  running 
times.  The  methods  applied  include  the  recent  Robertson-Seymour  theorems  on  the  well-partial- 
ordering  of  graphs  under  both  the  minor  and  immersion  orders.  The  complexity  of  search  versions 
of  these  problems  is  also  briefly  addressed. 

Key  words,  nonconstructive  proofs,  polynomial-time  complexity,  well-partially-ordered  sets 

AMS(MOS)  subject  classifications.  68C25,  68E10,  68K05 

1.  Introduction.  Practical  problems  are  often  characterized  by  fixed-parameter 
instances.  In  the  VLSI  domain,  for  example,  the  parameter  may  represent  the  number 
of  tracks  permitted  on  a  chip,  the  number  of  processing  elements  to  be  employed,  the 
number  of  channels  required  to  connect  circuit  elements,  or  the  load  on  communica¬ 
tions  links.  In  fixing  the  value  of  such  parameters,  we  help  focus  on  the  physically 
realizable  nature  of  the  system  rather  than  on  the  purely  abstract  aspects  of  the 
model. 

In  this  paper,  we  employ  and  extend  Robertson-Seymour  poset  techniques  to 
prove  low-degree  polynomial-time  decision  complexity  for  a  variety  of  fixed-parameter 
layout,  placement,  and  routing  problems,  dramatically  lowering  known  time-complexity 
upper  bounds.  Our  main  results  are  summarized  in  Table  1,  where  n  denotes  the  num¬ 
ber  of  vertices  in  an  input  graph  and  k  denotes  the  appropriate  fixed  parameter.  (At 
the  referee’s  urging,  we  also  list  relevant,  previously  published  results  from  [5],  [8],  as 
noted  in  the  rightmost  column  of  the  table.) 

In  the  next  section,  we  survey  the  necessary  background  from  graph  theory  and 
graph  algorithms  that  makes  these  advances  possible.  Sections  3-5  describe  our  results 
on  several  representative  types  of  decision  problems,  illustrating  a  range  of  techniques 
based  on  well-partially-ordered  sets.  In  §6,  we  discuss  how  self-red ucibility  can  be  used 
to  bound  the  complexity  of  search  versions  of  these  problems.  A  few  open  problems 
and  related  issues  are  briefly  addressed  in  the  final  section. 

2.  Background.  Except  where  explicitly  noted  otherwise,  all  graphs  that  we 
consider  are  finite  and  undirected.  A  graph  H  is  less  than  or  equal  to  a  graph  G  in 
the  minor  order,  written  H  <m  G ,  if  and  only  if  a  graph  isomorphic  to  H  can  be 
obtained  from  G  by  a  series  of  these  two  operations:  taking  a  subgraph  and  contracting 
an  edge.  For  example,  the  construction  depicted  in  Fig.  1  shows  that  W4  <m  Q 3. 
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Table  1 
Main  results. 


General 
problem  area 

Problem 

Best  previous 
upper  bound 

Our  result 

circuit  layout 

GATE  MATRIX  LAYOUT 

open 

0(n2)  [5] 

MIN  CUT  LINEAR  ARRANGEMENT 

0(nk~1) 

0(n2) 

Linear 

MODIFIED  MIN  CUT 

0(nk) 

0(n2) 

arrangement 

TOPOLOGICAL  BANDWIDTH* 

0(nk) 

0(n2)  [8] 

VERTEX  SEPARATION 

0(nfc2+ 2k+4) 

0(n2) 

Circuit  design 

and  utilization 

CROSSING  NUMBER* 

MAX  LEAF  SPANNING  TREE 

SEARCH  NUMBER 

open 

0(n2k+1 ) 

0(n2kl+4k+s) 

0(n3)  [8] 

0(n2) 

0(n2) 

2-D  GRID  LOAD  FACTOR 

open 

0(n2) 

Embedding 

BINARY  TREE  LOAD  FACTOR 

open 

0(n2) 

and  routing 

DISK  DIMENSION 

open 

0(n3)  [5] 

EMULATION 

open 

0(n3)  [8] 

*  Input  restricted  to  graphs  of  maximum  degree  three. 


G=Q* 


-  contract 


H=Wa 


Fig.  1.  Construction  demonstrating  that  is  a  minor  of  Q$. 
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G  =  tfi+2tf2 


Fig.  2.  Construction  demonstrating  that  Ca  is  immersed  in  K\  +  2RV 


Note  that  the  relation  <m  defines  a  partial  ordering  on  graphs.  A  family  F  of 
graphs  is  said  to  be  closed  under  the  minor  ordering  if  the  facts  that  G  is  in  F  and 
that  H  <m  G  together  imply  that  H  must  be  in  F.  The  obstruction  set  for  a  family 
F  of  graphs  is  the  set  of  graphs  in  the  complement  of  F  that  are  minimal  in  the  minor 
ordering.  Therefore,  if  F  is  closed  under  the  minor  ordering,  it  has  the  following 
characterization:  G  is  in  F  if  and  only  if  there  is  no  H  in  the  obstruction  set  for  F 
such  that  H  <m  G. 

Theorem  2.1  (see  [25]).  Graphs  are  well-partially- ordered1  by  <m. 

Theorem  2.2  (see  [24]).  For  every  fixed  graph  H,  the  problem  that  takes  as  input 
a  graph  G  and  determines  whether  H  <m  G  is  solvable  in  polynomial  time. 

Theorems  2.1  and  2.2  guarantee  the  existence  of  a  polynomial-time  decision  algo¬ 
rithm  for  any  minor-closed  family  of  graphs,  but  do  not  provide  any  details  of  what 
that  algorithm  might  be.  Moreover,  no  proof  of  Theorem  2.1  can  be  entirely  con¬ 
structive.  For  example,  there  can  be  no  systematic  method  of  computing  the  finite 
obstruction  set  for  an  arbitrary  minor-closed  family  F  from  the  description  of  a  Turing 
machine  that  precisely  accepts  the  graphs  in  F  [9]. 

An  interesting  feature  of  Theorems  2.1  and  2.2  is  the  low  degree  of  the  poly¬ 
nomials  bounding  the  decision  algorithms’  running  times  (although  the  constants  of 
proportionality  are  enormous) .  Letting  n  denote  the  number  of  vertices  in  G ,  the  time 
required  to  recognize  F  is  0(n3).  If  F  excludes  a  planar  graph,  then  F  has  bounded  - 
tree- width  [22]  and  the  time  complexity  decreases  to  0(n2). 

A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  immersion  order,  written 
H  <i  G,  if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of  these  two  operations:  taking  a  subgraph  and  lifting2  a  pair  of  adjacent  edges.  For 
example,  the  construction  depicted  in  Fig.  2  shows  that  C4  <»  K\  +  2K2  (although 
C4^mKl  +  2K2). 

The  relation  <*,  like  <m,  defines  a  partial  ordering  on  graphs  with  the  associated 
notions  of  closure  and  obstruction  sets. 

Theorem  2.3  (see  [21]).  Graphs  are  well-partially- ordered  by  <*. 


1  A  partially-ordered  set  (X,  <)  is  well-partially- ordered  if  (1)  any  subset  of  X  has  finitely  many 
minimal  elements  and  (2)  X  contains  no  infinite  descending  chain  x\  >  x2  >  ^3  >  ■  -  •  of  distinct 
elements. 

2  A  pair  of  adjacent  edges  uv  and  vw}  with  u  ^  v  ^  w,  is  lifted  by  deleting  the  edges  uv  and  vw 

and  adding  the  edge  uw. 
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The  proof  of  the  following  result  is  original,  although  it  has  been  independently 
observed  by  others  as  well  [20]. 

Theorem  2.4.  For  every  fixed  graph  H,  the  problem  that  takes  as  input  a  graph 
G  and  determines  whether  H  <{G  is  solvable  in  polynomial  time. 

Proof.  Letting  k  denote  the  number  of  edges  in  if,  we  replace  G  =  { V ,  E)  with 
G'  =  (Vr/,£'/),  where  \V'\  =  k\V\  4-  \E\  and  \E'\  =  2k\E\.  Each  vertex  in  V  is  replaced 
in  G'  with  k  vertices.  Each  edge  e  in  E  is  replaced  in  G '  with  a  vertex  and  2k  edges 
connecting  this  vertex  to  all  of  the  vertices  that  replace  e’s  endpoints.  We  can  now 
apply  the  disjoint-connecting  paths  algorithm  of  [24],  since  it  follows  that  H  <»  G 
if  and  only  if  there  exists  an  injection  from  the  vertices  of  H  to  the  vertices  of  G ' 
such  that  each  vertex  of  H  is  mapped  to  some  vertex  in  Gf  that  replaces  a  distinct 
vertex  from  G,  and  such  that  G'  contains  a  set  of  k  vertex-disjoint  paths,  each  one 
connecting  the  images  of  the  endpoints  of  a  distinct  edge  in  if.  0 

Theorems  2.3  and  2.4,  like  Theorems  2.1  and  2.2,  only  guarantee  the  existence  of 
a  polynomial-time  decision  algorithm  for  any  immersion-closed  family  F  of  graphs. 
The  method  we  use  in  proving  Theorem  2.4  yields  an  obvious  time  bound  of  0(n/l+6), 
where  h  denotes  the  order  of  the  largest  graph  in  E’s  obstruction  set.  (There  are 
0(nh)  different  injections  to  consider;  the  disjoint-paths  algorithm  takes  cubic  time 
on  G',  a  graph  of  order  at  most  n2.)  Thanks  to  the  next  theorem  of  Mader,  however, 
we  find  that  the  bound  immediately  reduces  to  0(nh+3)  because  the  problem  graphs 
of  interest  permit  only  a  linear  number  of  distinct  edges. 

Theorem  2.5  (see  [14]).  For  any  graph  H  there  exists  a  constant  ch  such  that 
every  simple  graph  G  =  (V,E)  with  |E|  >  Cn|Vj  satisfies  G  >i  H. 

We  show  in  §4  that  by  exploiting  excluded-minor  knowledge  on  immersion-closed 
families  the  time  complexity  for  determining  membership  can,  in  many  cases,  be 
reduced  to  0(n2). 

3.  Exploiting  the  minor  order.  Given  a  graph  G  of  order  n,  a  linear  layout  of 
G  is  a  bijection  £  from  V  to  {1, 2,  •  •  • ,  n}.  For  such  a  layout  £,  the  vertex  separation  at 
location  z,  sg(i),  is  |{u  :  u  €  V,  £(u)  <  z,  and  there  is  some  v  6  V  such  that  uv  €  E  and 
£{y)  >  z}|.  The  vertex  separation  of  the  entire  layout  is  se  =  max{se(i)  :  1  <  z  <  n}, 
and  the  vertex  separation  of  G  is  z;s(G)  =  min{s^  :  £  is  a  linear  layout  of  G}. 

Given  both  G  and  a  positive  integer  A;,  the  .VE-complete  VERTEX  SEPARA¬ 
TION  problem  [13]  asks  whether  vs(G)  is  less  than  or  equal  to  k.  It  has  previously 
been  claimed  that  VERTEX  SEPARATION  can  be  decided  in  0{nk3+2k+ 4)  time  [4], 
and  is  thus  in  V  for  any  fixed  value  of  k.  We  now  prove  that  the  problem  can  be 
solved  in  time  bounded  by  a  polynomial  in  n,  the  degree  of  which  does  not  depend 
on  k. 

Theorem  3.1.  For  any  fixed  k,  VERTEX  SEPARATION  can  be  decided  in 
0(n2)  time. 

Proof.  Let  k  denote  any  fixed  positive  integer.  We  show  that  the  family  F  of 
“yes”  instances  is  closed  under  the  minor  ordering.  To  do  this,  we  must  prove  that  if 
vs(G)  <  k  then  vs(H)  <  k  for  every  H  <m  G.  Without  loss  of  generality,  we  assume 
that  H  is  obtained  from  G  by  exactly  one  of  these  three  actions:  deleting  an  edge, 
deleting  an  isolated  vertex,  or  contracting  an  edge. 

If  H  is  obtained  from  G  by  deleting  an  edge,  then  vs(H)  <  vs(G)  <  k  because 
the  vertex  separation  of  any  layout  of  G  either  remains  the  same  or  decreases  by  1 
with  the  removal  of  an  edge.  If  H  is  obtained  from  G  by  deleting  an  isolated  vertex, 
then,  also  clearly,  vs(H)  <  k. 

Suppose  that  H  is  obtained  from  G  by  contracting  the  edge  uv.  Let  £  denote  a 
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layout  of  G  .hose  “'{*>  <  f' 

ztztxt? r«T> > « “ 

on  the  vertex  separation  at  each  loc  d  either  uw  e  £ 

1  <  i  <  <(u).  If  there  exists  a  vertex  w  with  «(t» >  «W  _  ^  Similar 

or  tw  G  £,  then  «<•(<(«))  Jhe  rangesl(l)  <  i  <  t{v)  and  t{v)  <i<n. 

arguments  establish  that  MO  <  fc  and  vs(H)  <  k. 

Therefore,  the  vertex  separation  o  °  ^  p  fa  minor.closed.  It  remains 

We  conclude  that,  in  any  case,  Hu 1  >  “  j  vertex  separation  (such  an 

-  - *  *- — * of 

0(noL  a  ^aph  G  and  a  positive  integer 

problem  [19]  asks  whether  * 8 “ 5  ^  about  the  edges  of  G,  with  complete 
tive  who  is  free  to  move  with  arbitrary  P  recisely  we  say  that  every  edge  of 

knowledge  of  the  location  of  the  searc  e  .  ecolnes  c|ear  either  when  a  searcher  is 
G  is  initially  contaminated  An  edge  e  remains  at  u  (v),  or  when  all  edges 

moved  from  u  to  v  {v  to  u)  w  i  e  searcher  at  u  (v)  is  moved  to  v  (u).  (A  clear 

incident  on  .  W  except «  “  *  sealche,  produces  .  puth  with- 

edge  e  becomes  recontaminated  if  i  The  goal  is  to  determine  if  there 

out  searchers  between  a  contaminated  g  edgesbeing  clear  simultaneously, 

exists  a  sequence  of  search  (D  P^e  a  searcher  on 

where  each  such  step  is  one  o  e  o  /<j\  rem0ve  a  searcher  from  a  vertex. 

«  v“‘“- <2) *  Search  NUMBER  k  decidubie  in  0( tints  M- 
Asts  b“  ' *%£££  noS  by  P.pudinti.ri.u  [«1,  bowotsr,  ntinor-closur,  c» 

TioZ'tr  *.  SEARCH  NUMBER  cun  k  k cidad  in  CK »>) 

excluded  trees.  0  uaytfAF  SPANNING  TREE  problem  [11]. 

Consider  next  the  /^-complete  M  .  this  problem  asks  whether  G 

Given  a  connected  graph  G  and  a  posi  iv  ’have  degree  one.  This  problem  ’ 

possesses  a  spanning  tree  in  which  °r  m  fThere  are  (?)  ways  to  select  fc  leaves 
can  be  solved  by  brute  force  in  0(n^  )  boa.  (^e  are  U )  ^  ^  0(n2fc) 

and  O(n)  possible  adjacencies  to  consi  remainder  Df  Q  can  be  determined  in 
candidate  solutions,  the  connec  m .  y ■  number  of  edges.)  Although  this 

“■>«  T,  ™  ted  *•  ”s“k  “ 

means  that  MAX  LEAF  ^  Dolvnomiai  running  time. 

minor-closure  so  as  to  ensure  a  LEAF  SPANNING  TREE  can  be  decided 

'  THEOREM  3.3.  For  any  fixed  k,  MAX  LEAF  SPAIN  IN  UN  u 

in  0(n2)  time.  PnnqiHer  the  proper  subset  of  the 

Proof.  Let  fc  denote  any  fixed  posit! 1V*  “f  Jrwhose  connected  components  has  a 
“no”  instances,  the  family  F  °  .  closed  under  the  minor  ordering,  from 

spanning  tree  with  k  or  more  leaves.  y  for  connectedness 

which  the  theorem  Mows  because  we  need  only  test  an  input  gr  P 

and  nonmembership  in  F.  0 
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4.  Exploiting  the  immersion  order.  An  embedding  of  an  arbitrary  graph  G 
into  a  fixed  constraint  graph  C  is  an  injection  /:  V{G)  -*  V(C)  together  with  an 
assignment,  to  each  edge  uv  of  G,  of  a  path  from  f(u)  to  }{v)  in  C.  The  minimum 
load  factor  of  G  relative  to  C  is  the  minimum,  over  all  embeddings  of  G  in  C,  of  the 
maximum  number  of  paths  in  the  embedding  that  share  a  common  edge  in  C. 

For  example,  for  the  case  in  which  C  is  the  infinite-length  one-dimensional  grid, 
the  minimum  load  factor  of  G  with  respect  to  C  is  called  the  cutwidth  of  G.  In  the 
./^-complete  MIN  CUT  LINEAR  ARRANGEMENT  problem  [11],  we  are  given  a 
graph  G  and  an  integer  k,  and  are  asked  whether  the  cutwidth  of  G  is  no  more  than 
k.  Related  A/'P-complete  problems  address  the  cutwidth  of  G  relative  to  C  when  C 
is  the  infinite-length,  fixed-width  two-dimensional  grid  (2-D  GRID  LOAD  FACTOR) 
or  when  C  is  the  infinite-height  binary  tree  (BINARY  TREE  LOAD  FACTOR). 

THEOREM  4.1.  For  any  fixed  k  and  any  fixed  C,  the  family  of  graphs  for  which 
the  minimum  load  factor  relative  to  C  is  less  than  or  equal  to  k  is  closed  under  the 
immersion  ordering. 

Proof.  Let  an  embedding  /  of  G  in  C  with  load  factor  no  more  than  k  be  given. 
Suppose  that  H  <i  G.  If  H  C  G,  then  the  embedding  that  restricts  f  to  H  clearly 
has  load  factor  no  more  than  k.  If  H  is  obtained  from  G  by  lifting  the  edges  uv  and 
vw  incident  at  vertex  v,  then  an  embedding  for  H  can  be  defined  by  assigning  to  the 
resulting  edge  uw  the  composition  of  the  paths  from  u  to  v  and  from  v  to  us  in  C. 
This  cannot  increase  the  load  factor.  0 

Corollary  4.2.  For  any  fixed  k,  MIN  CUT  LINEAR  ARRANGEMENT,  2-D 
GRID  LOAD  FACTOR,  and  BINARY  TREE  LOAD  FACTOR  can  be  decided  m 
polynomial  time. 

This  result  has  previously  been  reported  for  MIN  CUT  LINEAR  ARRANGE¬ 
MENT,  using  an  algorithm  with  time  complexity  [16].  We  now  prove  that 

it  is  sometimes  possible  to  employ  excluded-minor  knowledge  on  immersion-closed 
families  to  guarantee  quadratic-time  decision  complexity. 

Theorem  4.3.  For  any  fixed  k,  MIN  CUT  LINEAR  ARRANGEMENT,  2-D 
GRID  LOAD  FACTOR,  and  BINARY  TREE  LOAD  FACTOR  can  be  decided  in 

0(n2)  time. 

Proof.  For  MIN  CUT  LINEAR  ARRANGEMENT,  it  is  known  that  there  are 
binary  trees  with  cutwidth  exceeding  k  for  any  fixed  k  [2].  Let  T  denote  such  a  tree. 
Because  T  has  maximum  degree  three,  it  follows  that  G  >m  T  implies  G  >*  T.  Thus 
no  G  >m  T  can  be  a  “yes”  instance  (recall  that  the  “yes”  family  is  immersion  closed) 
and  we  know  from  [22]  that  all  “yes”  instances  have  bounded  tree-width.  (TVee- 
width  and  the  associated  metric  branch- width  are  defined  and  related  to  each  other 
in  [23].)  Now  we  need  only  search  for  a  satisfactory  tree-decomposition,  using  the 
0(n2)  method  of  [24].  Testing  for  obstruction  containment  in  the  immersion  order 
can  be  done  in  linear  time  on  graphs  of  bounded  tree- width  in  this  setting  [24],  given 
such  a  tree-decomposition. 

Sufficiently  large  binary  trees  are  excluded  for  2-D  GRID  LOAD  FACTOR  as 
well  (recall  that  both  k  and  the  grid- width  are  fixed). 

For  BINARY  TREE  LOAD  FACTOR,  it  is  a  simple  exercise  to  see  that  all  “yes” 
instances  have  bounded  tree-width  by  building  a  tree-decomposition  with  width  at 
most  3 k  from  a  binary  tree  embedding  with  load  factor  at  most  k.  (The  decomposition 
tree  T  can  be  taken  to  be  the  finite  subtree  of  C  that  spans  the  image  of  G.  For  vertex 
u  e  V(T),  the  associated  set  of  vertices  of  G  contains  the  inverse  image  of  u  if  one 
exists,  and  every  vertex  veV{G)  with  an  incident  edge  that  is  assigned  a  path  in  C 
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that  includes  u.)  0 

5.  Other  methods.  The  application  of  Theorems  2. 1-2.4  directly  ensures  poly¬ 
nomial-time  decidability.  A  less  direct  approach  relies  on  the  well-known  notion  of 
polynomial-time  transformation,  as  we  now  illustrate  with  an  example.  The  ft/V- 
complete  MODIFIED  MIN  CUT  problem  was  first  introduced  in  [13].  Given  a  linear 
layout  £  of  a  simple  graph  <3,  the  modified  cutwidth  at  location  i,  c*(i),  is  |{e  :  e  = 
uv  €  E  such  that  £(u)  <  i  and  £(v)  >  t}|.  The  modified  cutwidth  of  the  entire  layout 
is  ci  =  max{c*(z)  :  1  <  i  <  n},  and  the  modified  cutwidth  of  G  is  mc(G)  =  min{c£  :  £ 
is  a  linear  layout  of  <3}.  Given  both  G  and  a  positive  integer  k ,  the  MODIFIED  MIN 
CUT  problem  asks  whether  mc(G)  is  less  than  or  equal  to  k.  Observe  that,  while  the 
MIN  CUT  LINEAR  ARRANGEMENT  problem  addresses  the  number  of  edges  that 
cross  any  cut  between  adjacent  vertices  in  a  linear  layout,  the  MODIFIED  MIN  CUT 
problem  addresses  the  number  of  edges  that  cross  (and  do  not  end  at)  any  cut  on  a 
vertex  in  the  layout. 

When  k  is  fixed,  neither  the  family  of  “yes”  instances  nor  the  family  of  “no” 
instances  for  MODIFIED  MIN  CUT  is  closed  under  either  of  the  available  orders. 
Nevertheless,  we  can  employ  a  useful  consequence  of  well-partially-ordered  sets. 

Consequence  (see  [8]).  If  (5,  <)  is  a  well-partially-ordered  set  that  supports 
polynomial-time  order  tests  for  every  fixed  element  of  S,  and  if  there  is  a  polynomial¬ 
time  computable  map  t:  D  — ►  S  such  that  for  F  C  D,  (a)  t(F)  C  S  is  closed  under 
<  and  (b)  t(F)  D  t(D  —  F)  =  0,  then  there  is  a  polynomial-time  decision  algorithm  to 
determine  for  input  z  in  D  whether  z  is  in  F. 

To  use  this  result  on  fixed- A:  MODIFIED  MIN  CUT,  observe  that  if  any  vertex 
of  a  simple  graph  G  has  degree  greater  than  2k  +  2,  then  G  is  automatically  a  “no” 
instance.  Given  a  simple  graph  G  with  maximum  degree  less  than  or  equal  to  2k  -f  2, 
we  first  augment  G  with  loops  as  follows:  if  a  vertex  v  has  degree  d  <  2k  +  2,  then  it 
receives  (2k  +  2)  —  d  new  loops.  Letting  Gf  denote  this  augmented  version  of  <3,  we 
now  replace  G'  with  the  Boolean  matrix  M,  in  which  each  row  of  M  corresponds  to 
an  edge  of  Gr  and  each  column  of  M  corresponds  to  a  vertex  of  G'.  That  is,  M  has 
\E(\  rows  and  n  columns,  with  Mij  =  1  if  and  only  if  edge  i  is  incident  on  vertex  j..  M 
and  k'  =  3k  +  2  are  now  viewed  as  input  to  the  GATE  MATRIX  LAYOUT  problem 
[3],  in  which  we  are  asked  whether  the  columns  of  M  can  be  permuted  so  that,  if  in 
each  row  we  change  to  *  every  0  lying  between  the  row’s  leftmost  and  rightmost  1,- 
then  no  column  contains  more  than  k  Is  and  *s.  Thus  a  permutation  of  the  columns 
of  M  corresponds  to  a  linear  layout  of  <3.  For  such  a  permutation,  each  *  in  column 
ij  1  <  i  <  n,  represents  a  distinct  edge  crossing  a  cut  at  vertex  i  in  the  corresponding 
layout  of  <3. 

Theorem  5.1.  For  any  fixed  k ,  MODIFIED  MIN  CUT  can  be  decided  in  Q(n2) 
time. 

Proof.  We  apply  the  consequence,  using  the  set  of  all  graphs  for  5,  <m  for  <,  the 
set  of  simple  graphs  of  maximum  degree  2k  4-  2  for  D,  the  family  of  “yes”  instances 
in  D  for  F,  and  the  composition  of  the  map  just  defined  from  graphs  to  matrices 
with  the  map  of  [5]  from  matrices  to  graphs  for  t.  Testing  for  membership  in  D  and 
computing  t  are  easily  accomplished  in  0(n2)  time.  That  t(F)  is  closed  under  <m 
and  excludes  a  planar  graph  for  any  fixed  k  is  established  in  [5].  Finally,  condition  (b) 
holds  because,  for  any  G  in  D,  t(G)  is  a  “yes”  instance  for  GATE  MATRIX  LAYOUT 
with  parameter  3k  +  2  if  and  only  if  G  is  a  “yes”  instance  for  MODIFIED  MIN  CUT 
with  parameter  k.  □ 
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6.  Search  problems.  Given  a  decision  problem  Il£>  and  its  search  version  115, 
any  method  that  pinpoints  a  solution  to  II5  by  repeated  calls  to  an  algorithm  that 
answers  Up  is  termed  a  self -reduction.  This  simple  notion  has  been  formalized  with 
various  refinements  in  the  literature,  but  the  goal  remains  the  same:  to  use  the  exis¬ 
tence  of  a  decision  algorithm  to  prove  the  existence  of  a  search  algorithm.  Note  the 
crucial  importance  of  self-reducibility  in  the  current  context,  given  that  Theorems 
2. 1-2.4  only  yield  decision  algorithms,  not  search  procedures. 

It  sometimes  suffices  to  fatten  up  a  graph  by  adding  edges  to  isolate  a  solution. 
For  example,  this  strategy  can  be  employed  to  construct  solutions  to  (fixed-A;)  GATE 
MATRIX  LAYOUT,  when  any  exist,  in  0(n4)  time  [1],  It  follows  from  the  proof  of 
Theorem  5.1  that  the  same  can  be  said  for  MODIFIED  MIN  CUT  as  well.  We  leave 
it  to  the  reader  to  verify  that  such  a  scheme  works  for  the  search  version  of  (fixed-A;) 
VERTEX  SEPARATION,  by  attempting  to  add  each  edge  in  V  x  V  -  E  in  arbitrary 
order,  retaining  in  turn  only  those  whose  addition  maintains  a  “yes”  instance,  and  at 
the  end  reading  off  a  satisfactory  layout  (from  right  to  left)  by  successively  removing  a 
vertex  of  smallest  degree.  This  self-reduction  automatically  solves  the  search  version 
of  SEARCH  NUMBER,  also  (see  the  discussion  of  “2-expansions”  in  [4]). 

Conversely,  it  is  sometimes  possible  to  trim  dovm  a  graph  by  deleting  edges  so 
as  to  isolate  a  solution.  It  is  easy  to  see  that  this  simple  strategy  yields  an  0(n4)- 
time  algorithm  for  the  search  version  of  (fixed- /c)  MAX  LEAF  SPANNING  TREE,  by 
attempting  to  delete  each  edge  in  E  in  arbitrary  order,  retaining  in  turn  only  those 
whose  deletion  does  not  maintain  a  “yes”  instance. 

Another  technique  involves  the  use  of  graph  gadgets.  A  simple  gadget,  consisting 
of  two  new  vertices  with  k  edges  between  them,  is  useful  in  constructing  a  solution  to 
(fixed-A;)  MIN  CUT  LINEAR  ARRANGEMENT,  when  any  exist,  in  0(n4)  time  [1]. 
A  similar  use  of  gadgets  enables  efficient  self- reductions  for  load  factor  problems.  (On 
BINARY  TREE  LOAD  FACTOR,  for  example,  we  can  begin  by  using  two  A;-edge 
gadgets  uv  and  wx  to  locate  a  vertex  y  of  the  input  graph  that  can  be  mapped  to  a 
leaf  of  the  constraint  tree  by  identifying  u ,  w ,  and  y .) 

Indeed,  polynomial-time  self-reductions  exist  for  all  of  the  problems  that  we  study 
in  this  paper.  In  addition  to  the  straightforward  methods  just  mentioned,  faster  but 
more  elaborate  techniques  are  described  in  [6],  [9]. 

7.  Concluding  remarks.  The  range  of  problems  amenable  to  an  approach 
based  on  well-partially-ordered  sets  is  remarkable.  Although  the  problems  that  we 
have  addressed  in  this  paper  are  all  fixed-parameter  versions  of  problems  that  are 
AfP-hard  in  general,  we  remind  the  reader  that  by  fixing  parameters  we  do  not  auto¬ 
matically  trivialize  problems,  and  thereby  obtain  polynomial-time  decidability  (con¬ 
sider,  for  example,  GRAPH  A;-COLORABILITY  [11]).  Moreover,  the  techniques  that 
we  have  employed  can  be  used  to  guarantee  membership  in  V  for  problems  that  have 
no  associated  (fixed)  parameter  [8]. 

Table  1  suffers  from  one  notable  omission,  namely,  BANDWIDTH  [11].  The 
only  success  reported  to  date  has  concerned  restricted  instances  of  TOPOLOGICAL 
BANDWIDTH.  Both  BANDWIDTH  and  the  related  EDGE  BANDWIDTH  problem 
[7]  have  resisted  this  general  line  of  attack  so  far.  Clearly,  BANDWIDTH  is  at  least 
superficially  similar  to  other  layout  permutation  problems  we  have  addressed,  and 
fixed-A;  BANDWIDTH,  like  the  others,  is  solvable  in  (high-degree)  polynomial-time 
with  dynamic  programming  [12].  Perhaps  BANDWIDTH,  however,  is  really  different; 
it  is  one  of  the  very  few  problems  that  remain  A/^-complete  when  restricted  to  trees 
[10]. 
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The  results  that  we  have  derived  here  immediately  extend  to  hypergraph  prob¬ 
lem  variants  as  long  as  hypergraph  instances  can  be  efficiently  reduced  to  graph 
instances.  For  example,  such  reductions  are  known  for  HYPERGRAPH  VERTEX 
SEPARATION  and  HYPERGRAPH  MODIFIED  MIN  CUT  [17],  [27]. 

Finally,  we  observe  that  even  partial-orders  that  fail  to  be  well-partial-orders 
(on  the  set  of  all  graphs)  may  be  useful.  For  example,  although  it  is  well  known 
that  graphs  are  not  well-partially-ordered  under  the  topological  order,  it  has  been 
shown  [15]  that,  for  every  fixed  h,  all  graphs  without  h  vertex-disjoint  cycles  are  well- 
partially-ordered  under  topological  containment.  Also,  polynomial-time  order  tests 
exist  [24].  Problems  such  as  (fixed-/:)  TOPOLOGICAL  BANDWIDTH,  therefore, 
are  decidable  in  polynomial  time  as  long  as  the  input  is  restricted  to  graphs  with  no 
more  than  h  disjoint  cycles  (for  fixed  h ).  Similarly,  we  might  employ  the  result  [26] 
that  graphs  without  a  path  of  length  h,  for  h  fixed,  sire  well-partially-ordered  under 
subgraph  containment. 
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whose  extremely  careful  review  of  the  original  version  of  this  paper  greatly  helped  to 
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Abstract —  In  this  note  is  reported  a  collection  of  constructions  of 
symmetric  networks  that  provide  the  largest  known  values  for  the 
number  of  nodes  that  can  be  placed  in  a  network  of  a  given  degree  and 
diameter.  Some  of  the  constructions  are  in  the  range  of  current  potential 
engineering  significance.  The  constructions  are  Cayley  graphs  of  linear 
groups  obtained  by  experimental  computation. 

Index  Terms — Cayley  graphs,  interconnection  networks,  linear  groups. 


I.  Introduction 

The  problem  of  constructing  large  graphs  of  a  given  degree  and 
diameter  has  received  much  attention,  and  is  significant  for  parallel 
processing  because  it  models  two  important  constraints  in  the  design 
of  massively  parallel  processing  systems:  1)  there  are  limits  on  the 
number  of  processors  to  which  any  processor  in  the  network  can  be 
directly  connected,  and  2)  the  distance  between  any  two  processors 
in  the  network  should  not  be  too  great.  Other  applications  of  such 
networks  include  shared-key  cryptographic  protocols  and  the  design 
of  local  area  networks.  See  [3]  and  [9]  for  recent  surveys. 

In  this  paper  we  give  evidence  that  the  table  of  largest  known 
constructions  for  small  values  of  the  two  parameters  can  be  improved 
for  many  parameter  values  by  methods  based  on  finite  linear  groups. 
In  many  cases  the  networks  we  describe  here  are  dramatically  larger 
than  those  previously  known. 

Many  of  our  improvements  are  in  the  range  of  the  numbers  of 
processors  currently  being  considered  for  large  parallel  processing 
systems,  suggesting  that  some  of  these  constructions  may  merit 
further  investigation  for  such  applications.  This  is  the  focus  of 
continuing  research  by  some  of  our  party.  In  this  note  we  present 
only  our  accumulated  results  on  the  now  classic  problem  of  network 
construction.  In  particular,  we  do  not  address  the  many  interesting 
problems  concerning  routing  and  data  exchange  that  would  be  crucial 
for  most  parallel  processing  applications.  . 
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For  an  overview  of  our  results  see  Table  I,  which  represents  an 
updated  version,  obtained  from  Bermond  [5]  of  the  table  published  in 
[3].  Interested  readers  are  advised  that  a  “current”  table  incorporating 
the  results  of  many  workers  on  this  problem  is  maintained  by  and 
available  from  that  helpful  source.  The  entries  in  the  table  that  are 
due  to  our  efforts  and  reported  on  in  this  note  are  marked  in  bold. 
Other  entries  that  have  been  obtained  by  Cayley  graph  techniques 
are  marked  with  an  asterisk.  In  particular,  two  other  groups  of 
researchers  have  recently  and  independently  obtained  record-breaking 
constructions  based  on  linear  groups  [4],  [8]. 

II.  Algebraic  Symmetry  as  an  Organizing  Principle  for 
Parallel  Processing 

There  are  important  considerations  apart  from  degree  and  diameter 
that  must  figure  in  any  choice  of  network  topology  for  parallel 
computation.  A  network  is  (vertex-)  symmetric  if  for  any  two  nodes 
u,  v  there  is  an  automorphism  of  the  network  mapping  u  to  v.  Our 
approach  yields  symmetric  constructions,  and  we  believe  that  in  this 
may  lie  their  greater  value.  Symmetry  is  one  of  the  most  powerful 
and  natural  tools  to  apply  to  the  central  problem  of  massively 
parallel  computation:  how  to  organize  and  coordinate  computational 
resources. 

The  symmetries  of  the  networks  we  describe  are  represented  by 
simple  algebraic  operations  (such  as  2  x  2  matrix  multiplications  and 
modulo  arithmetic).  The  main  advantage  of  algebraically  constructed 
networks  is  that  the  developed  mathematical  resources  of  algebra  are 
available  to  structure  the  problems  of 

1)  design  and  description 

2)  testing 

3)  data  exchange  and  routing 

4)  scheduling  and  computation  mapping. 

The  appeal  of  hypercubes,  cube-connected  cycles,  butterfly  net¬ 
works,  and  others  rests  in  large  part  on  the  availability  of  easily 
computed  (and  comprehended)  symmetries.  These  popular  network 
designs  and  those  that  we  describe  all  belong  to  a  class  of  algebraic 
networks  based  on  vector  spaces  and  their  symmetry  groups.  For  re¬ 
cent  algebraic  approaches  to  routing  algorithms,  deadlock  avoidance, 
emulation,  and  scheduling  for  algebraically  described  networks  of 
this  kind  see  [2],  [1],  [8],  [11],  and  [12]. 

Our  main  result  in  this  brief  paper  is  a  demonstration  that  algebraic 
symmetry  provides  a  powerful  approach  to'  problem  1),  design  and 
description.  Our  approach  centers  on  the  following  definition. 

Definition:  If  A  is  a  group  and  5  C  A  is  a  generating  set  that 
is  closed  under  inverses,  i.e.,  5  =  5US"1,  then  the  (undirected) 
Cayley  graph  (A,  5)  is  the  graph  with  vertex  set  A  and  with  an  edge 
between  elements  a  and  b  of  A  if  and  only  if  as  =  b  for  some  s  £  5. 

Every  Cayley  network  is  symmetric  (symmetries  are  given  by 
group  multiplication).  The  degree  of  a  Cayley  graph  (A,  5)  is 
A  =  |5|  and  the  diameter  of  (A,  5)  is 

D  =  ^x{mtin  :  a  =  Sl s*  €  5. i  =  1, •••,*}■ 

It  is  remarkable  (but,  indeed,  natural)  that  most  networks  that  have 
been  considered  for  large  parallel  processing  systems  (including  hy¬ 
percubes,  torus  grids,  cube-connected-cycles  and  butterfly  networks) 
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TABLE  I 

Algebraic  Symmetry  as  an  Organizing  Principle  for  Parallel  Processing 


A  D 

2 

3 

4 

5 

6 

7 

8 

9 

10 

3 

10 

20 

38 

70 

128 

184 

320 

540 

938 

4 

15 

40 

95 

364 

734 

1081* 

2943* 

7439* 

15657* 

5 

24 

70 

182 

532 

2742 

4368 

11200 

33600 

123120 

6 

32 

105 

355 

1081 

7832 

13310 

50616 

202464 

682080 

7 

50 

128 

506 

2162 

10554 

39732 

140000 

911088 

2002000 

8 

57 

203 

842 

3081 

39258 

89373 

455544 

1822175 

3984120 

9 

74 

585 

1248 

6072 

74954 

215688 

910000 

3019632 

15686400 

10 

91 

650 

1820 

12144 

132932 

486837 

2002000 

7714494 

47059200 

are  Cayley  graphs.  A  standard  reference  on  Cayley  graphs  is  [7],  For 
a  Cayley  graph  description  of  the  cube-connected-cycles  see  [10]. 

Symmetry  immediately  provides  the  following  advantage  for  the 
design  problem  considered  here:  to  compute  the  diameter  of  a  Cayley 
graph  it  is  only  necessary  to  compute  the  distances  from  a  single  node 
to  all  others.  Furthermore,  the  compactness  of  an  algebraic  description 
allows  for  an  efficient  computational  search  strategy. 

Our  results  were  obtained  by  experimental  computing  with  rela¬ 
tively  simple  programs  on  small  machines  (an  IBM  PC  and  a  VAX 
11/780).  The  programs  followed  closely  the  above  expression  for 
the  diameter  of  a  Cayley  graph.  Having  focused  (by  setting  the 
appropriate  program  parameters)  on  a  particular  kind  of  matrix  group, 
and  on  a  choice  of  cardinality  for  the  generating  set  (hence  the  degree 
of  the  resulting  graph),  the  diameter  was  computed  for  repeated 
random  choices  of  the  generating  set  until  (in  the  favorable  case) 
a  new  record  was  obtained.  Consonant  with  the  above  expression  for 
the  diameter,  this  is  done  by  starting  with  the  identity  of  the  group  as 
the  live  set,  multiplying  the  elements  of  the  live  set  with  the  elements 
of  the  generator  set,  recording  any  new  elements  obtained  (the  new 
live  set)  in  a  large  array  representing  all  elements  in  the  target  group, 
and  repeating  this  until  no  new  elements  are  obtained.  The  number 
of  repetitions  until  this  occurs  is  the  diameter. 

The  reader  may  reasonably  wonder  about  several  things,  beginning 
with  the  large  number  of  authors  of  this  note  and  including  perhaps 
the  question  of  whether  some  voodoo  was  employed  in  choosing 
the  target  groups  and  in  exploring  the  search  space  of  generator 
sets.  The  explanation  of  the  first  is  simply  that  exploration  of  this 
approach  to  this  design  problem  has  continued  among  us  at  a  low 
level  for  a  number  of  years  beginning  with  the  seminal  work  of  the 
author  subset:  Carlsson  and  Sexton.  Although  we  have  tried  several 
“sophisticated”  heuristics  for  choosing  groups  and  generators,  we 
must  honestly  report  that  none  of  these  has  proven  better  than  simple 
and  straightforward  random  search,  with  the  exception  of  the  nearly 
obvious  guidance  that  one  should  choose  a  nonabelian  group!  Several* 
of  our  record-breaking  constructions  employ  upper-triangular  matrix 
groups,  but  we  are  unable  to  explain  why  these  worked  better  than 
other  possibilities. 

Thus,  in  some  sense  these  results  are  less  interesting  than  one  might 
at  first  suppose,  although  the  above  information  may  underscore  our 
main  point:  the  power  of  an  algebraic  approach  (even  a  simple  one). 
A  sophisticated  understanding  of  what  is  possible  by  the  method  of 
Cayley  graphs  would  be  highly  desirable,  but  it  seems  to  present  a 
difficult  mathematical  problem. 

The  next  section  describes  some  examples  of  our  constructions  and 
the  associated  costs  of  our  computational  explorations. 

III.  Example  Constructions 

Given  that  a  “winning”  set  of  generators  exists  for  a  group  it 
would  be  interesting  to  know  the  expected  time  for  random  search  to 


discover  a  winning  set.  We  have  no  real  information  on  this  (it  would 
seem  to  be  a  difficult  mathematical  problem  to  give  any  bounds),  but 
we  do  indicate  in  the  example  descriptions  that  follow  the  time  that 
was  required  for  the  particular  search  that  uncovered  the  construction 
as  a  rough  indication  of  the  amount  of  computational  effort  involved. 
About  half  of  the  record-breaking  constructions  that  we  report  here 
(the  ones  of  smaller  order!)  were  obtained  on  a  PC,  by  a  search 
program  running  in  some  cases  for  only  a  few  minutes  and  in  some 
cases  for  a  few  days.  For  the  approach  that  we  have  taken  memory 
is  a  more  important  computational  bottleneck  than  speed. 

In  what  follows  GL[n,g]  denotes  the  {general  linear )  group  of 
n  x  n  matrices  with  entries  in  the  finite  field  with  q  elements  (since 
below  q  is  always  a  prime,  this  is  just  the  integers  mod  q),  and 
SL[n,  q\  is  the  special  linear  subgroup  of  GL[n,  q]  consisting  of  those 
matrices  with  determinant  1. 

Example  1:  Degree  5,  diameter  7:  4368  vertices. 

This  is  a  Cayley  graph  on  the  subgroup  of  GL[2,13]  consisting  of 
the  matrices  with  determinant  in  the  set  {1,  —1).  The  generators  are 
the  following  elements  together  with  their  inverses. 


'0  1 

1  0 


order  2 


11 

8 


2 

12 


order  52 


'll  4 
7  5 


order  14. 


The  discovery  time  for  this  construction  was  approximately 
10  hours  on  an  IBM  PC  for  a  small  Pascal  program. 

Example  2:  Degree  8,  diameter  7:  89  373  vertices 
This  is  a  Cayley  graph  on  a  subgroup  of  GL[3,31].  The  generators 
are  the  following  elements  together  with  their  inverses. 


‘1 

12 

10‘ 

'1 

25 

15 

0 

1 

15 

order  93 

0 

1 

4 

order  93 

_0 

0 

25 

0 

0 

5 

'1 

29 

29  ‘ 

‘1 

27 

5 

0 

1 

16 

order  93 

0 

1 

8 

order  31 

0 

0 

5 

0 

0 

-I 

The  discovery  time  for  this  construction  was  approximately  3  hours 
of  CPU  time  on  a  VAX  11/780. 

Example  3:  Degree  10,  diameter  5:  12 144  vertices. 

This  is  a  Cayley  graph  on  the  group  SL[2,23].  The  generators  are 
the  following  elements  together  with  their  inverses. 


*9  O' 
IS  18 


order  11 


13  10 
18  21 


14  7 
19  3 


order  22 


order  11 

18  13] 
17  20 


'9 

_0 


10 

17 


order  22 


order  24. 


The  discovery  time  for  this  construction  was  approximately  2  hours 
on  an  IBM  PC. 
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Parameters 


degree  5  diameter  7 
degree  5  diameter  8 
degree  5  diameter  9 
degree  5  diameter  10 
degree  6  diameter  4 
degree  6  diameter  5 
degree  6  diameter  7 
degree  6  diameter  8 
degree  6  diameter  9 
degree  6  diameter  10 
degree  7  diameter  4 
degree  7  diameter  5 
degree  7  diameter  7 
degree  7  diameter  8 
degree  7  diameter  9 
degree  7  diameter  10 
degree  8  diameter  3 
degree  8  diameter  4 
degree  8  diameter  5 
degree  8  diameter  7 

degree  8  diameter  8 
degree  8  diameter  9 
degree  9  diameter  5 
degree  9  diameter  8 
degree  10  diameter  5 
degree  10  diameter  8 


Order 

Group 

4368 

subgroup  of  GL[2,13] 

8788 

subgroup  of  GL[3,13] 

25308 

PSL[2,37] 

123120 

GL[2,19] 

355 

subgroup  of  GL[2,71] 

1081 

subgroup  of  GL[2,47] 

13310 

subgroup  of  GL[3,11] 

50616 

SL[2,37] 

202464 

subgroup  of  GL[2,37] 

682080 

GL[2,29] 

.  506 

subgroup  of  GL[2,23] 

2162 

subgroup  of  GL[2,47} 

39732 

PSL[2,43] 

101232 

subgroup  of  GL[2,37] 

911088 

subgroup  of  GL[2,37] 

1822176 

GL[2,37] 

203 

subgroup  of  GL[2,29] 

812 

subgroup  of  GL[2,29] 

3081 

subgroup  of  GL[2,79] 

89373 

subgroup  of  GL[2,31] 

455544 

subgroup  of  GL[2,37] 

1822176 

GL[2,37] 

6072 

PSL[2,23] 

682080 

GL[2,29] 

12144 

SL[2,23] 

1822176 

subgroup  of  GL[2,37] 

TABLE  II 


Generators:  order  S  =  S  U  5 


-l 


[0,1,1,0]:2  [11,4,7,53:14  [11,2,8,12]:52 

[1,0,4,0,1,0,0,0,123:2  [1,5,6,0,1,9,0,0,53:52  [1,2,12,0,1,8,0,0,53:52 
[0,36,1,03:2  [34,26,34,13:37  {2,16,11,33]:37 
[0,1,1,03:2  [11,16,0,153:18  [16,1 1,2,0} :45 
[54,66,0,13:5  [5,43,0,1}:5  [57,38,0,1}:5 
[7,20,0,1 } :23  [6,33,0,1}:23  [9,42,0,1]:23 

[1,2,7,0,1,0,0,0,10}:22  [1,5, 2,0, 1,2,0, 0,4]:55  [1,6,10,0,1,3,0,0,5}:55 
[32,24,35,2):19  [23,16,28,34} :36  [12,24,15,27}:37  . 

[25,1,31,1}:36  [12,35,23,30}:76  [12,4,28,16}:152 

[28,10,8,83:28  [17,13,16,27]:28  [3,4,27,14] :840 

[22,1,0,1]:2  [13,16, 0,1}:11  [3,16,0,1]:11  [19,12,0,1]:22 

[46,1,0,1}:2  [4,20,0, 1]:23  [20,27,0,1]:46  [29,14,0, 1]:46 

[0,42,1,0]:2  [18,16,38,41}:22  [34,2,37,6] :22  [8,28,14,33]:43 

[0,1,1,0]:2  [21,34,17,17]:6  [21,1,4,2]:9  [27, 26, 4, 8]: 74 

[0,1, 1,0] :2  [23,17,14,26]:  18  [25,16,13,6]:36  [27,33, 19, 22]:684 

[0,1,1,03:2  [1,19,14,16]:17  [36,1,12,0]:  18  [35,28,34,12]:456 

[16,9,0,11:7  [16,21,0, 1]:7  [25,15, 0,1]:7  [25,9,0,1]:7 

[12,1,0,11:4  [20,24,0, 1]:7  [6,27,0,1}:14  [15,18,0,1]:28 

[46,43,0,1]:13  [49,72,0, 1}:39  [19,26, 0,1]:39  [13,13,0,1]:39 

[1, 4,25,0, 1,23,0,0,1]:31  [1,29,29, 0,1,16, 0,0,5]:93  [1,12,10,0,1,15,0,0,25} :93 

[1,6,17,0,1,24,0,0,5}:93 

[21,9,17,5]:57  [0,26,3,1]:171  [28,32, 33,33]:171  [9,34,25,16]:342 

[12,13, 34, 33]:18  [36,6,20,10]:36  [35,3,19,35]:684  [26,10, 36, 31]:1368 

[0,22, 1,0] :2  [2,18,4f2]:ll  [10,1,21,16]:24  [6,19,4,9]:24  [22,0,1,22]:46 

[0,1,1, 0]:2  [5,22,18,26]:14  [17,15,21,4]:840  [2,5,10,21]:840  [23,12,11,21]:840 

[9,0,18,18]:11  [13,10,18,21]:11  [9,10,0,17]:22  [14,7,19,3]:22  [18,13, 17,20]:24 

[21,12,22,5}:57  [9,12,6,26}:456  [35,10, 17, 32}:684  [5,31,35,14}:684  [11,3,33,7]:  1368 


IV.  The  Constructions 

During  the  publication  process  for  this  note  we  have  become  aware 
of  a  new  approach  to  this  design  problem,  not  based  on  Cayley 
graphs,  that  shares  with  our  approach  the  aspects  of  1)  a  significant 
exploitation  of  symmetry,  and  2)  computational  exploration  [5].  This 
has  had  the  effect  on  this  note  of  removing  from  “bold”  seven  entries 
of  the  original  version  of  Table  I.  We  have  retained  the  descriptions 
of  the  Cayley  graphs  that  gave  those  entries  in  the  table  that  follows, 
as  they  may  still  be  of  interest  by  virtue  of  their  vertex  symmetry 
or  other  properties. 

V.  Conclusions 

Our  main  contribution  in  this  brief  presentation  is  the  demon¬ 
stration  of  the  power  of  an  algebraic  approach  to  the  problem  of 
constructing  large  networks  of  a  given  degree  and  diameter.  The 
success  of  the  relatively  limited  search  we  have  so  far  conducted- 
seems  to  indicate  that  further  exploration  based  on  Cayley  graphs 
may  be  productive.  Major  problems  relevant  to  applicants  in  parallel 
processing  and  not  addressed  here  concern  message  routing  and  data 
exchange.  Solutions  are  likely  to  be  much  more  complicated  in  such 
networks  as  we  have  described  than  in  the  familiar  (Cayley  graph) 
networks  of  hypercubes  and  cube-connected  cycles,  and  this  remains 
an  area  for  further  research. 
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Abstract 

Abrahamson,  K.,  M.R.  Fellows,  M.A.  Langston  and  B.M.E.  Moret,  Constructive  complexity, 
Discrete  Applied  Mathematics  34  (1991)  3-16. 

Powerful  and  widely  applicable,  yet  inherently  nonconstructive,  tools  have  recently  become 
available  for  classifying  decision  problems  as  solvable  in  polynomial  time,  as  a  result  of  the  work 
of  Robertson  and  Seymour.  These  developments  challenge  the  established  view  that  equates  trac- 
tability  with  polynomial-time  solvability,  since  the  existence  of  an  inaccessible  algorithm  is  of  very 
little  help  in  solving  a  problem.  In  this  paper,  we  attempt  to  provide  the  foundations  for  a  con¬ 
structive  theory  of  complexity,  in  which  membership  of  a  problem  in  some  complexity  class  in¬ 
deed  implies  that  we  can  find  out  how  to  solve  that  problem  within  the  stated  bounds.  Our 
approach  is  based  on  relations,  rather  than  on  sets;  we  make  much  use  of  self-reducibility  and 
oracle  machines,  both  conventional  and  “blind”,  to  derive  a  series  of  results  which  establish  a 
structure  similar  to  that  of  classical  complexity  theory,  but  in  which  we  are  in  fact  able  to  prove 
results  which  remain  conjectural  within  the  classical  theory. 

1.  Introduction 

Powerful  and  widely  applicable,  yet  inherently  nonconstructive,  tools  have 
recently  become  available  for  classifying  decision  problems  as  solvable  in  poly- 

*  This  research  is  supported  in  part  by  the  Washington  State  Technology  Center,  by  the  National 
Science  Foundation  under  grant  M1P-8703879,  by  the  Office  of  Naval  Research  under  contracts 
N00014-88-K-0343  and  N00014-88-K-0456,  and  by  the  NSERC  of  Canada. 
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nomial  time,  as  a  result  of  the  work  of  Robertson  and  Seymour  [24,25]  (sec  also 
[12]).  When  applicable,  the  combinatorial  finite  basis  theorems  at  the  core  of  these 
developments  are  nonconstructive  on  two  distinct  levels.  First,  a  polynomial-time 
algorithm  is  only  shown  to  exist :  no  effective  procedure  for  finding  the  algorithm 
is  established.  Secondly,  even  if  known,  the  algorithm  only  decides :  it  uncovers 
nothing  like  “natural  evidence”.  Examples  of  the  latter  had  been  rare,  one  of  the  few 
such  is  primality  testing,  where  the  existence  of  a  factor  can  be  established  without, 
however,  yielding  a  method  for  finding  said  factor.  Moreover,  there  had  been  a 
tendency  to  assume  that  there  is  no  significant  distinction  between  decision  and  con¬ 
struction  for  sequential  polynomial  time  [13].  However,  there  is  now'  a  flood  of 
polynomial-time  algorithms  based  on  well-partial-order  theory  that  produce  no 
natural  evidence  [4,5,20].  For  example,  determining  whether  a  graph  has  a  knotless 
embedding  in  3-space  (i.e.,  one  in  which  no  cycle  traces  out  a  nontrivial  knot)  is 
decidable  in  cubic  time  [20],  although  no  recursive  method  is  known  for  producing 
such  an  embedding  even  when  one  is  knowm  to  exist. What  is  worse  (from  a  com¬ 
puter  scientist’s  point  of  view'),  the  nonconstructive  character  of  these  theorems  is 
inherent,  as  they  are  independent  of  standard  theories  of  arithmetic  [8];  thus  at¬ 
tempts  at  “constructivizing”  these  results  [7]  may  yield  practical  algorithms  in  some 
cases,  but  cannot  succeed  over  the  entire  range  of  application. 

These  developments  cast  a  shadow  on  the  traditional  view  that  equates  the  trac- 
tability  of  a  problem  with  the  existence  of  a  polynomial-time  algorithm  to  solve  the 
problem.  This  view  was  satisfactory  as  long  as  existence  of  such  algorithms  was 
demonstrated  constructively;  however,  the  use  of  the  existential  quantifier  is  finally 
“catching  up”  with  the  algorithm  community.  The  second  level  of  nonconstructive- 
ness— the  lack  of  natural  evidence— has  been  troubling  theoreticians  for  some  time: 
how  does  one  trust  an  algorithm  that  only  provides  one-bit  answers?  Even  when  the 
answer  is  uniformly  “yes”,  natural  evidence  may  be  hard  to  come  by.  (For  instance, 
we  know  that  all  planar  graphs  are  4-colorable  and  can  check  planarity  in  linear 
time,  but  are  as  yet  unable  to  find  a  4-coloring  in  linear  time.)  Robertson  and 
Seymour’s  results  further  emphasize  the  distinction  between  deciding  a  problem  and 
obtaining  natural  evidence  for  it. 

All  of  this  encourages  us  to  consider  ways  in  which  some  of  the  foundations  of 
complexity  theory  might  be  alternatively  formulated  from  a  constructive  point  of 
view.  (We  intend  the  term  “constructive”  to  convey  an  informal  idea  of  our  goals; 
this  is  not  to  be  confused  with  a  constructivist  approach  [2],  which  would  certainly 
answer  our  requirements,  but  may  well  prove  too  constrictive.  However,  our  intent 
is  certainly  constructivist,  in  that  we  intend  to  substitute  for  simple  existence  certain 
acceptable  styles  of  proof  based  on  “natural”  evidence.) 

We  base  our  development  on  the  relationships  between  the  three  aspects  of  a  deci¬ 
sion  problem:  evidence  checking,  decision,  and  searching  (or  evidence  construc¬ 
tion).  The  heart  of  our  constructive  formulation  is  to  provide  for  each  problem  to 
be  equipped  with  its  own  set  of  allowable  proofs,  in  much  the  same  manner  as  the 
usual  definition  of  NP-membership  [9];  moreover,  in  deterministic  classes,  the 
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proof  itself  should  be  constructive.  This  leads  us  to  a  formalization  based  on 
evidence  generators  and  evidence  checkers-exactly  as  in  classical  complexity 
theory,  but  where  the  checker  is  given  as  part  of  the  problem  specification  rather 
than  discovered  as  part  of  the  solution.  In  other  words,  we  come  to  the  “Aleorithm 
Shop”  prepared  to  accept  only  certain  types  of  evidence. 

The  result  of  this  approach  is  a  formulation  based  on  relations ,  rather  than  on 
sets  as  in  the  classical  formulation.  This  formulation  provides  a  natural  perspective 
on  self-reducibility  and  oracle  complexity  [1, 14,27-29],  concepts  which  have  recent¬ 
ly  received  renewed  scrutiny.  We  define  oracle  mechanisms  through  which  we  can 
ask  interesting  questions  about  the  value  of  proofs  and  the  nature  of  nonconstruc¬ 
tiveness;  we  manage  to  answer  some  of  these  questions,  including  one  which  remains 
a  conjecture  in  its  classical  setting. 


2.  Problems  and  complexity  classes 


In  classical  complexity  theory,  a  decision  problem  is  a  language  (or  set)  over  some 
alphabet,  L  c  2T*;  the  main  question  concerning  a  language  is  the  decision  problem: 
given^some  string  x,  is  it  an  element  of  the  language  LI  However,  since  a  simple 
“yes”  or  “no”  answer  clearly  lacks  credibility,  one  may  require  that  the  algorithm 
also  produce  a  proof  for  its  answer,  within  some  prespecified  acceptability  criteria. 
Such  a  proof— or  sketch  of  one—  is  termed  evidence  and  the  problem  of  deciding 
membership  as  well  as  producing  evidence  is  called  a  search  (or  certificate  construc¬ 
tion)  problem.  Finally,  in  order  for  such  a  proof  to  be  of  use,  it  must  be  concise 
and  easily  checked;  the  problem  of  verifying  a  proof  for  some  given  string  is  the 
checking  problem.  In  a  relational  setting,  all  three  versions  admit  a  particularly 
simple  formulation;  for  a  fixed  relation  &tc Z*xZ*: 

•  Checking :  given  (x,y),  does  (x,  y)  e  #? 

•  Deciding :  given  .v,  does  there  exist  a  y  such  that  (x,y)e&? 

•  Searching :  given  x,  find  a  y  such  that  (x,y)e&. 

For  the  most  part,  the  decision  and  search  versions  of  a  problem  have  been  held  to 
be  of  comparable  complexity,  as  such  is  indeed  the  case  for  many  problems  (witness 
the  notion  of  NP-completeness  and  NP-equivalence);  as  to  checking,  classical  com¬ 
plexity  theory  has  only  considered  it  in  the  case  of  nondeterministic  classes.  Schnorr 
[27]  has  studied  the  relationship  between  the  decision  and  search  versions  of  a  prob¬ 
lem  and  attempted  to  characterize  this  relationship  for  decision  problems  within 
NP;  Schoning  [28]  has  proposed  a  model  of  computation  (robust  oracle  machines) 
under  which  decision  automatically  incorporates  checking;  other  authors  have  also 
investigated  the  search  version  of  decision  problems  and  proposed  mechanisms  for 
its  characterization  [1, 14, 17, etc.].  We  go  one  step  further  and  (essentially)  ignore 
the  decision  version,  concentrating  instead  on  the  checking  and  search  versions  and 
their  relationship. 
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Definition  2.1.  A  decision  problem  is  a  pair  /7  =  (/,M),  where  M  is  a  checker  and 
/  is  the  set  of  “yes”  instances. 

The  checker  M  defines  a  relation  between  yes  instances  and  acceptable  evidence 
for  them,  providing  a  concise  and  explicit  description  of  the  relation.  Moreover,  this 
formulation  only  allows  problems  for  which  a  checker  can  be  specified,  thereby 
avoiding  existence  problems.  However,  in  the  following,  we  shall  generally  use  the 
relational  formalism  explicitly;  in  that  formalism,  given  relation  the  set  of 
acceptable  instances  of  the  problem  is  the  domain  of  the  relation,  which  we  denote 
D(&). 

This  definition  makes  a  problem  into  a  subjective  affair:  two  different  computer 
scientists  may  come  to  the  “Algorithm  Shop”  with  different  checkers  and  thus  re¬ 
quire  different  evidence-generating  algorithms— to  the  point  where  one  may  be 
satisfied  with  a  constant-time  generator  and  the  other  may  require  an  exponential¬ 
time  one.  Many  classical  problems  (such  as  the  known  NP-complete  problems)  have 
“obvious”  evidence— what  we  shall  call  natural  evidence;  for  instance,  the  natural 
evidence  for  the  satisfiability  problem  is  a  satisfying  truth  assignment. 

Solving  such  a  problem  entails  two  steps:  generating  suitable  evidence  and  then 
checking  the  answer  with  the  help  of  the  evidence;  this  sequence  of  steps  is  our  con¬ 
structive  version  of  the  classical  decision  problem.  (Indeed,  to  reduce  our  version 
to  the  classical  one,  just  reduce  the  checker  to  a  trivial  one  which  simply  echoes  the 
first  bit  of  the  evidence  string.)  Thus  the  complexity  of  such  problems  is  simply  the 
complexity  of  their  search  and  checking  components,  which  motivates  our  defini¬ 
tion  of  complexity  classes. 

Definition  2.2.  A  constructive  complexity  class  is  a  pair  of  classical  complexity 
classes,  (Cj,C2),  where  Cj  denotes  the  resource  bounds  within  which  the  evidence 
generator  must  run  and  C2  the  bounds  for  the  checker.  Resource  bounds  are  defined 
with  respect  to  the  classical  statement  of  the  problem,  i.e.,  with  respect  to  the  size 
of  the  domain  elements;  for  nondeterministic  classes,  Cj  is  omitted,  thereby 
denoting  that  the  evidence  generator  may  simply  guess  the  evidence. 

For  instance,  we  shall  define  the  class  Pc  to  be  the  pair  (P,  P),  thus  requiring  both 
generator  and  checker  to  run  in  polynomial  time;  in  other  words,  Pc  is  the  class  of 
all  P-checkable  and  P-searchable  relations.  In  contrast,  we  shall  define  NPC  simply 
as  the  class  of  all  P-checkable  relations,  placing  no  constraints  on  the  generation  of 
evidence.  (But  note  that,  since  the  complexity  of  checking  is  defined  with  respect 
to  the  domain  of  the  relation,  the  polynomial-time  bound  on  checking  translates  in¬ 
to  a  polynomial-time  bound  on  the  length  of  acceptable  evidence.) 

Definition  2.3.  A  problem  (I,M)  belongs  to  a  class  (Cj,C2)  if  and  only  if  the  rela¬ 
tion  defined  by  M  on  /  is  both  Cj  -searchable  and  C2-checkable. 


Constructive  complexity 


7 


These  general  definitions  only  serve  as  guidelines  in  defining  interesting  construc¬ 
tive  complexity  classes.  Counterparts  of  some  classical  complexity  classes  im¬ 
mediately  suggest  themselves:  since  all  nondeterministic  classes  are  based  on  the 
existence  of  checkable  evidence,  they  fit  very  naturally  within  our  framework.  Thus, 
for  instance,  we  define 

•  NLOGSPACE,  =  (-,  LOGSPACE), 

•NP,  =  (-,P), 

•  NEXPC  =  (- EXP). 

(Note  that,  even  if  NEXP  =  EXP— i.e.,  NEXP  is  EXP-decidable— ,  it  does  not 
follow  that  NEXP  is  EXP-searchable;  indeed,  there  exists  a  relativization  where  the 
first  statement  is  true  but  the  second  false  [1 1].  In  contrast,  NP  is  P-decidable  if  and 
only  if  it  is  P-searchable.  This  contrast— an  aspect  of  upward  separation— adds  in¬ 
terest  to  our  definitions  of  NPC  and  NEXP,.)  Deterministic  classes,  on  the  other 
hand,  may  be  characterized  by  giving  generator  and  checker  the  same  resource 
bounds: 

•  LOGSPACE,  =  (LOGSPACE,  LOGSPACE), 

•P,  =  (P,P), 

•  PSPACE,  =  (PSPACE,  PSP  ACE). 

The  principal  tool  in  the  study  of  complexity  classes  is  the  reduction.  Many-one 
reductions  between  decision  problems  are  particularly  simple  in  the  classical 
framework:  one  simply  maps  instances  of  one  problem  into  instances  of  the  other, 
respecting  the  partition  into  yes  and  no  instances,  so  that  the  original  instance  is  ac¬ 
cepted  if  and  only  if  the  transformed  one  is.  However,  in  our  context,  such  reduc¬ 
tions  are  both  insufficient  and  too  stringent:  we  require  evidence  to  go  along  with 
the  “answer”,  but  can  also  use  this  evidence  to  “correct  any  error”  made  during 
the  forward  transformation.  In  essence,  the  goal  of  a  constructive  reduction  is  to 
recover  evidence  for  the  problem  at  hand  from  the  evidence  gleaned  for  the 
transformed  instance.  Thus  a  many-one  reduction  between  two  problems  is  given 
by  a  pair  of  maps;  similar  mechanisms  have  been  proposed  by  Levin  [17]  for 
transformations  among  search  problems  and  by  Krentel  [15]  (who  calls  his  “metric 
reductions”)  for  transformations  among  optimization  problems,  where  the  solution 
is  the  value  of  the  optimal  solution.  Note  that  the  strict  preservation  of  the  partition 
into  yes  and  no  instances  is  no  longer  required:  we  must  continue  to  map  yes  in¬ 
stances  into  yes  instances,  but  can  also  afford  to  map  no  instances  into  yes  instances, 
as  we  shall  be  able  to  invalidate  the  apparent  “yes”  answer  when  attempting  to 
check  the  evidence. 

Definition  2.4.  A  constructive  (many-one)  transformation  from  problem  (relation) 
to  problem  2  *s  a  Pa*r  °f  functions  (/g),  such  that 

(1)  xeD(3Rx)^f{x)eD(^1)\  and 

(2)  ^  6  j )  a  (/(a*),  y')  e  fc  g(x,  /))  e  ^  j . 

(Note  that,  for  obvious  reasons,  the  evidence-transforming  map,  g,  must  take  the 
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original  instance  as  argument  as  well  as  the  evidence  for  the  transformed  instance.) 
With  each  complexity  class  we  associate  a  suitable  class  of  transformations  by 
bounding  the  resources  available  for  the  computation  of  the  two  maps:  for  instance, 
transformations  within  NPC  must  run  in  polynomial  time  and  transformations 
within  Pc  or  NLOGSPACEc  in  logarithmic  space. 

Theorem  2.5.  Constructive  many-one  reductions  are  transitive . 

For  both  resource  bounds  mentioned  above,  the  proof  is  trivial. 

Equipped  with  many-one  reductions,  we  can  proceed  to  define,  in  the  obvious 
way,  a  notion  of  completeness  for  our  classes.  Note  that  a  desirable  consequence 
of  our  definitions  would  be  that  a  problem  complete  for  some  class  in  the  classical 
setting  remains  complete  for  the  constructive  version  of  the  class  when  equipped 
with  the  natural  checker,  if  available.  Since  the  notion  of  natural  checker  is  rather 
vague,  we  establish  this  result  for  some  specific  classes. 

Let  SAT/Nat  be  the  satisfiability  problem  (in  conjunctive  normal  form)  equipped 
with  the  natural  checker  which  requires  a  truth  assignment  as  evidence;  similarly, 
let  CV/Nat  be  the  circuit  value  problem  (see  [16])  equipped  with  evidence  consisting 
of  the  output  of  each  gate  and  let  GR/Nat  be  the  graph  reachability  problem  (see 
[26])  equipped  with  evidence  consisting  of  the  sequence  of  edges  connecting  the  two 
endpoints. 

Theorem  2.6.  (1)  SAT/Nat  is  NPc-compIete. 

(2)  CV/Nat  is  Pc-complete. 

(3)  GR/Nat  is  NLOGSPACEc-complete. 

Proof.  We  only  sketch  the  proof  of  the  first  assertion;  the  others  use  a  similar 
technique,  taking  advantage  of  the  fact  that  the  generic  reductions  used  in  the 
classical  proofs  are  constructive. 

That  SAT/Nat  is  in  NPC  is  obvious.  Denote  by  0(M,x)  the  formula  produced  by 
Cook’s  transformation  when  run  on  a  polynomial-time  nondeterministic  Turing 
machine  M  and  input  string  A'.  Now  let  77  be  some  arbitrary  problem  in  NPC;  then 
there  exists  some  evidence  generator  for  77,  which  can  be  given  by  a  polynomial-time 
nondeterministic  Turing  machine  M'.  Let /(a:)  =  0(M\x)  and  let  g(x,y')  be  the  out¬ 
put  produced  by  M'  in  the  computation  described  by  the  truth  assignment  /  for  the 
formula  0(M\x).  Then  the  pair  (/,g)  is  easily  seen  to  be  a  constructive,  many-one, 
polynomial-time  reduction  from  77  to  SAT/Nat.  □ 

The  polynomial-time  reductions  found  in  the  NP-compIeteness  literature  are 
generally  constructive,  in  terms  of  natural  evidence.  Thus,  for  example,  the  vertex 
cover  problem  with  natural  evidence  (namely,  the  vertex  cover)  is  NPc-complete. 
Similar  comments  apply  for  P-compIete  problems  and  NLOGSPACE-complete 
problems.  A  natural  question  at  this  point  is  whether  or  not  the  following  statement 
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holds,  for  a  classical  complexity  class  C  and  its  constructive  counterpart  Cc: 

77  is  Cf -complete  if  and  only  if  77eCc  and  D(I7)  is  C- complete. 

The  only  if  part  would  obviously  hold  if  we  had  required  our  constructive  reductions 
to  preserve  the  partition  between  yes  and  no  instances.  The  if  part  can  be  interpreted 
as  follows  in  the  case  of  C  =  NP.  We  know  that  weak  probabilistic  evidence  exists 
for  all  problems  in  NP,  as  a  form  of  zero-knowledge  proof  exists  for  all  such  sets 
[3,10];  we  also  suspect  that,  for  some  sets  in  NP  (such  as  the  set  of  composite 
numbers),  some  forms  of  evidence  (factors  in  our  example)  are  harder  to  find  than 
others  (e.g.,  witnesses  of  the  type  used  by  Rabin  [22]).  Thus  the  if  part  would  imply 
that  there  exists  weak  deterministic  evidence  for  membership  in  an  NP-complete  set. 


3.  Self-reducibility  and  oracle  complexity 

Constructive  complexity  is  closely  tied  to  the  issue  of  self-reducibility.  Self- 
reducibility  itself  plays  an  important  role  in  classical  complexity  theory  (see,  e.g. 
[1]),  but  lacks  a  natural  definition  in  that  framework  (whence  the  large  number  of 
distinct  definitions).  In  our  constructive  formulation,  however,  self-reducibility  has 
a  very  natural  definition,  which,  moreover,  very  neatly  ties  together  the  three  facets 
of  a  decision  problem. 

Definition  3.1.  A  problem  77  in  some  class  (Q,C2)  is  (Turing)  self-reducible  if  it 
is  C2-searchable  with  the  help  of  an  oracle  for  D(IT).. 

In  other  words,  a  problem  self-reduces  if  it  can  be  searched  as  fast  as  it  can  be 
checked  with  the  help  of  a  decision  oracle;  all  three  facets  of  a  decision  problem  in¬ 
deed  come  together  in  this  definition.  In  the  case  of  NPC  problems,  our  definition 
simply  states  that  such  problems  self-reduce  if  they  can  be  solved  in  polynomial  time 
with  the  help  of  an  oracle  for  their  decision  version;  such  a  definition  coincides  with 
the  self- 1 -helpers  of  Ko  [14]  and  the  self-computable  witnesses  of  Balcazar  [1]. 

A  natural  question  to  ask  about  reducibility  is  whether  D(IT)  is  really  an  ap¬ 
propriate  oracle  for  77:  would  not  a  more  powerful  oracle  set  make  the  search 
easier?  We  can  show  that  such  is  not  the  case  and  that,  in  fact,  our  choice  of  oracle 
is  in  some  sense  optimal. 

Definition  3.2.  77  has  oracle  complexity  at  most/(/7)  if  there  is  a  deterministic  oracle 
algorithm,  using  any  fixed  oracle,  that  makes  at  most/(n)  oracle  queries  on  inputs 
of  length  n  and  produces  acceptable  evidence  for  77. 

We  restrict  the  oracle  algorithm  to  run  within  appropriate  resource  bounds:  such 
as  polynomial  time  for  problems  in  NPC  and  logarithmic  space  for  problems  within 
Pc. 


10 


K.  Abrahamson  er  at. 


The  oracle  complexity  of  some  specific  problems  in  NPC  has  recently  been  in¬ 
vestigated.  Rivest  and  Shamir  [23]  show  that  the  oracle  complexity  of  the  language 
of  composite  numbers  with  natural  evidence  (i.e.,  factors)  is  at  most  fl/3  +  0(l). 
Luks  [18]  has  shown  that  graph  isomorphism  with  natural  evidence  has  oracle  com¬ 
plexity  0(|//7).  Using  the  canonical  forms  technique  of  Miller  [19],  the  isomorphism 
problems  for  groups,  Latin  squares,  and  Steiner  triple  systems  have  oracle  complexi¬ 
ty  O(log2/?). 

The  following  theorems  summarize  some  properties  of  oracle  complexity  and 
demonstrate  that  our  choice  of  oracle  is  optimal.  We  have  restricted  our  purview 
to  the  three  classes  NP,  P,  and  LOGSPACE,  as  they  are  the  most  interesting  from 
a  practical  standpoint  and  as  they  are  also  representative  of  the  behavior  of  other 
complexity  classes.  The  first  two  theorems  have  trivial  proofs. 

Theorem  3.3.  (1)  If  some  NPc-complete  problem  has  logarithmic  oracle  complexity , 
then  P  =  NP. 

(2)  If  some  Pc-complete  problem  has  logarithmic  oracle  complexity ,  then 
P  =  LOGSPACE. 

(3)  If  some  NEOGSPACEc-complete  problem  has  logarithmic  oracle  complexity , 
then  NLOGPSACE=  LOGSPACE. 

Theorem  3.4.  If  some  NPc-complete  (] P  ^complete ,  NLOGSPACEc-complete) 
problem  has  poly  logarithmic  oracle  complexity ,  then  so  do  all  problems  in  NPC 
(Pf,  NLOGSPACEc). 

The  next  theorem  indicates  that  the  best  possible  oracle  need  never  be  outside  the 
complexity  class  in  which  the  problem  sits. 

Theorem  3.5.  ///7eNPc  (Pc,  NLOGSPACEc)  has  oracle  complexity  less  than  or 
equal  to  f(n)  with  some  fixed  oracle  A,  then  oracle  complexity  no  greater  than  f{n) 
can  be  achieved  for  77  using  an  NP -complete  ( P -complete ,  NI^OGSPACE -complete) 
oracle . 

Proof.  Note  that  f(n)  must  be  P-time  constructible  for  77eNPc,  with  similar  con¬ 
ditions  for  the  other  two  classes.  We  only  sketch  the  proof.  If  the  oracle  set  A  sits 
within  the  class,  but  is  not  complete  for  it,  we  can  simply  reduce  A  to  a  complete 
set  within  the  class  and  thus  replace  each  query  to  A  by  an  equivalent  query  to  the 
complete  set.  If  the  set  A  does  not  sit  within  the  class,  A  can  be  replaced  by  a  set 
A '  consisting  of  prefixes  of  those  computation  records  of  the  oracle  algorithm  that 
produce  acceptable  evidence.  □ 

We  can  pursue  the  consequences  of  this  last  result  in  terms  of  oracle  choice.  Our 
first  corollary  establishes  that  self-reduction  is  optimal  for  complete  problems;  its 
proof  follows  immediately  from  our  last  theorem. 
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Corollary  3.6.  If  a  problem  is  complete  for  one  of  these  three  classes .  then  it  self- 
red uces  as  efficiently  as  it  reduces  to  any  oracle  at  all. 

Since  it  is  easy  to  provide  at  least  one  self-reduction,  it  follows  that  complete 
problems  for  these  three  classes,  in  our  constructive  formalism,  always  self-reduce; 
such  is  not  the  case  in  the  classical  setting.  Our  second  corollary  shows  that  self¬ 
reduction  can  only  be  suboptimal  for  “incomplete”  problems. 

Corollary  3.7.  (1)  If  a  problem  in  NPC  is  found  that  self -reduces  less  efficiently 
than  it  does  to  a  given  oracle ,  then  P^NP. 

(2)  If  a  problem  in  Pc  is  found  that  self -reduces  less  efficiently  than  it  does  to  a 
given  oracle ,  then  P^LOGSPACE. 

(3)  If  a  problem  in  NLOGSPACEc  is  found  that  self -reduces  less  efficiently  than 
it  does  to  a  given  oracle ,  then  NLOGSPACEc LOGSPACE. 

Note  that  problems  obeying  the  hypotheses  cannot  be  complete  (by  our  previous 
corollary)  nor  can  they  belong  to  a  lower  complexity  class;  hence  they  are  in¬ 
complete  problems  (in  the  terminology  of  [3]). 

Theorem  3.4  and  Corollary  3.6  can  be  used  as  circumstantial  evidence  that  a  prob¬ 
lem  is  not  complete.  For  example,  since  group  isomorphism  has  polylogarithmic 
oracle  complexity,  Theorem  3.4  suggests  that  it  is  highly  unlikely  that  group  isomor¬ 
phism  with  natural  evidence  is  NPc-complete.  Luks’  oracle  algorithm  for  graph 
isomorphism,  together  with  Corollary  3.6,  provides  evidence  (further  to  that  of  [10]) 
that  graph  isomorphism  is  not  NP-compIete,  since  no  one  knows  of  a  self-reduction 
using  a  sublinear  number  of  queries. 

Oracle  complexity  can  be  a  useful  tool  in  the  design  of  algorithms.  Consider  the 
problem  of  determining  whether  a  given  graph  has  a  vertex  cover  of  size  at  most  k, 
for  any  fixed  k.  The  results  of  Robertson  and  Seymour  immediately  imply  that  there 
exists  a  quadratic-time  algorithm  for  this  problem.  Using  this  (unknown)  decision 
algorithm  as  an  oracle,  we  can  develop  a  cubic  time  search  algorithm  for  the  prob¬ 
lem,  which  we  then  turn  into  a  simple  linear-time  algorithm  by  eliminating  the 
unknown  decision  oracle.  Select  any  edge  («,  u);  delete  u  and,  using  the  oracle,  ask 
whether  the  remaining  graph  has  a  vertex  cover  of  size  at  most  k-  1.  If  so,  put  u 
in  the  vertex  cover  and  recur;  otherwise,  put  v  in  the  vertex  cover  and  recur.  In  k 
queries,  a  vertex  cover  has  been  generated  when  any  exists.  Now  one  can  eliminate 
the  oracle  by  trying  all  2k  possible  sequences  of  responses;  since  2k  is  a  constant, 
the  resulting  algorithm  runs  in  linear  time  for  each  value  of  k.  Better  yet,  we  know 
the  algorithm! 


4.  Blind  oracles  and  reductions 

Many  natural  self-reduction  algorithms  actually  make  no  essential  use  of  the  in- 
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put  [6].  Formalizing  this  observation  provides  another  perspective  on  oracle 
algorithms  and  a  way  to  measure  their  efficiency. 

Definition  4.1.  A  blind  oracle  algorithm  is  an  oracle  algorithm  which  has  only  ac¬ 
cess  to  the  length  of  the  input,  not  the  input  itself;  on  a  query  to  the  oracle,  the  input 
is  automatically  prefixed  to  the  query. 

Thus  a  blind  oracle  algorithm  attempting  to  produce  evidence  for  string  x  only 
has  access  to  the  value  \x\;  however,  on  query  string  y,  the  oracle  actually  decides 
membership  for  the  string  xy . 

Definition  4.2.  The  blind  oracle  complexity  of  a  problem  call  it  boc(^),  is  the 
minimum  number  of  oracle  calls  made  to  some  fixed,  but  unrestricted  language  by 
an  oracle  algorithm  (running  within  appropriate  resource  bounds)  which  uncovers 
acceptable  evidence  for  the  problem. 

Information-theoretic  arguments  immediately  give  lower  bounds  on  blind  oracle 
complexity.  For  example,  if  is  the  equality  relation,  we  clearly  have  boc (0t)~n. 
More  interesting  are  bounds  for  some  standard  NP-compIete  problems. 

Theorem  4.3.  Let  VC/Nat  be  the  vertex  cover  problem  with  natural  evidence  ( the 
cover)  and  HC/Nat  the  Hamiltonian  circuit  problem  with  natural  evidence  (the 
circuit). 

I°g(l*/2|-l)l  £b“(VC/Nat>s[lo*(L„^j)+l»«»  • 

log(l+^-)  <  boc(HC/Nat)  <  (/!  -  1 )  flog  (/!  -  1  )"| . 

Hence  boc(VC/Nat)  €  €>(n)  and  boc(HC/Nat)  e  0(/?  log  n). 

Proof.  The  upper  bounds  are  derived  from  simple  oracle  algorithms  for  each  prob¬ 
lem  (that  for  VC  actually  finds  a  minimal  cover  for  the  problem).  The  lower  bounds 
come  from  simple  counts  of  the  number  of  distinct  possible  arrangements  that  may 
have  to  be  checked  to  identify  a  solution.  □ 

Is  blind  oracle  complexity  preserved  in  some  sense  through  reductions?  The  main 
problem  here  is  that  our  reductions  are  not  themselves  blind  and  so  defeat  the  blind¬ 
ness  of  the  oracle  algorithms  by  giving  them  a  description  of  the  input  as  a  side- 
effect.  We  need  a  type  of  reduction  which  preserves  blindness  (such  is  theory!).  Such 
a  reduction  must  perforce  use  a  very  different  mechanism  from  that  of  constructive 
many-one  reductions.  Let  us  use  an  anthropomorphic  analogy.  In  the  latter  style  of 
reduction,  the  scientist  with  the  new  problem  calls  upon  a  scientist  with  a  known 
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complete  problem,  who  then  serves  as  a  one-shot  oracle:  the  first  communicates  to 
the  second  the  transformed  instance  and  the  second  returns  to  the  first  suitable 
evidence  for  the  transformed  instance.  The  second  scientist  acts  in  mysterious  ways 
(i.e.,  unknown  to  the  first)  and  thus  has  attributes  of  deity.  But  in  a  blind  reduction, 
neither  scientist  is  allowed  to  see  the  input  and  yet  the  instance  given  the  first  scien¬ 
tist  must  be  transformed  into  the  instance  given  the  second;  both  scientists  then  sit 
at  the  same  level,  as  humble  supplicants  to  some  all-powerful  deity.  The  reduction 
goes  as  follows:  the  first  scientist  asks  the  oracle  to  carry  out  the  transformation 
x  —*  f(x)  implicitly  (neither  x  nor  f(x)  will  be  made  known),  then  uses  the  oracle 
algorithm  provided  by  the  second  scientist  to  establish  a  certificate  /  for  the 
unknown  /(*),  and  finally  applies  g  to  recover  evidence  y  for  instance  x  from  the 
known  values  of  y'  and  \x\.  Since  the  oracle  algorithm  of  the  second  scientist 
assumes  knowledge  of  the  instance  size,  in  this  case  |/(x)|,  it  is  imperative  that 
|/(x)|  be  computable  from  |jc|;  hence  a  blind  reduction  must  be  uniform  with 
respect  to  instance  sizes.  Note  that  what  gets  communicated  in  the  blind  reduction 
is  the  oracle  protocol,  whereas  what  gets  communicated  in  the  normal  many-one 
reduction  is  the  (transformed)  instance. 

Definition  4.4.  A  blind  (constructive)  many-one  reduction  is  a  many-one  construc¬ 
tive  reduction,  (/,g),  where  the  first  map,/,  is  length-uniform  (i.e.,  \f(x)\  depends 
only  on  |x|)  and  the  second  map,  gt  only  has  access  to  the  size  of  the  original  input. 

Now  we  can  check  that  blind  reductions  indeed  preserve  blind  oracle  complexity; 
we  state  this  only  for  the  case  of  most  interest  to  us. 

Lemma  4.5.  7/^eNP,  blindly  reduces  to  &t2e NPC  and  &2  has polylogarithmic 
oracle  complexity ,  so  does  0tx. 

The  proof  is  obvious  from  our  anthropomorphic  discussion  above.. 

With  a  blind  transformation,  we  can  define  blind  completeness  in  the  obvious 
way.  Perhaps  surprisingly,  blindness  does  not  appear  to  affect  the  power  of  reduc¬ 
tions  very  much,  as  the  following  claim  shows. 

Claim  4.6.  All  known  NP -complete  sets  are  the  domains  of  blindly  NP ^complete 
relations. 

Obviously,  we  cannot  offer  a  proof  of  this  statement;  we  simply  remark  that  all 
known  reductions  between  NP-compIete  problems  can  easily  be  made  length- 
uniform  and  that,  in  all  of  these  reductions,  the  evidence  assumes  such  characteristic 
form  that,  armed  only  with  the  length  of  the  original  instance  and  evidence  for  the 
transformed  instance,  we  can  easily  reconstruct  evidence  for  the  original  instance. 
(This  obviously  is  strongly  reminiscent  of  the  observation  that  all  known  reductions 
among  NP-complete  problems  can  be  made  weakly  parsimonious,  so  that  the 
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number  of  different  certificates  for  one  problem  can  easily  be  recovered  from  the 
number  of  different  certificates  for  the  transformed  instance.) 

A  fundamental  question  in  classical  complexity  theory  concerns  the  density  of  sets 
in  various  classes  (recall  that  the  density  of  a  set  is  the  rate  of  growth  of  its  member¬ 
ship  as  a  function  of  the  length  of  the  elements).  A  set  S  is  spa/se  if  |{a|  a  eS, 
\x\  =«}|  <p(|a:|)  for  some  polynomial  p;  it  has  subexponential  density  (or  is  semi- 
sparse)  if  |  {x  |  *e  5,  \x\=n}\e  0(2,og^),  for  some  positive  integer  k.  It  is  widely 
suspected  that  NP-complete  sets  must  have  exponential  density;  however,  all  that 
is  known  at  this  time  is  that  NP-complete  sets  cannot  be  sparse  unless  P  =  NP  (even 
their  complexity  cores  cannot  at  present  be  shown  to  have  more  than  subexponential 
density  [21]).  Blind  oracles  allow  us  to  prove  in  our  context  a  much  stronger  result 
about  density. 

Theorem  4.7.  There  is  no  blindly  NP  .-complete  relation  with  domain  of  subex¬ 
ponential  density. 

Proof.  We  have  shown  that  VC/Nat  has  linear  blind  oracle  complexity,  which  is 
not  a  polylogarithmic  function.  Hence  the  domain  of  VC/Nat  has  density  greater 
than  subexponential.  Since  polylogarithmic  oracle  complexity  is  preserved  through 
blind  reductions,  it  follows  that  no  other  NPc-complete  problem  can  have  a  do¬ 
main  of  subexponential  density.  □ 

Combining  this  result  with  a  proof  for  our  claim  would  allow  us  to  transfer  our 
conclusion  to  NP-complete  sets;  it  might  also  help  in  characterizing  the  relationship 
between  the  sets  NP  and  POLYLOGSPACE. 


5.  Conclusions 

We  have  presented  a  proposal  for  a  constructive  theory  of  complexity.  By  examin¬ 
ing  reducibility  and  oracle  algorithms,  we  have  been  able  to  establish  a  number  of 
simple  results  which  show  that  our  theory  has  a  sound  basis  and  holds  much  pro¬ 
mise.  Indeed,  through  the  use  of  blind  oracle  methods,  we  have  been  able  to  prove 
within  our  framework  a  much  stronger  result  than  has  been  shown  to  date  in  the 
classical  theory. 

Much  work  obviously  remains  to  be  done.  Problems  of  particular  interest  to  us 
at  this  time  include  a  further  study  of  the  relationship  between  decision,  checking, 
and  search.  For  instance,  Schnorr  [27]  conjectures  that  there  exist  P-decidable 
predicates  that  are  not  P-searchable  if  we  require  particularly  concise  evidence;  this 
is  the  type  of  question  that  may  be  advantageously  addressed  within  our  framework. 
Another  problem  of  special  interest  is  the  characterization  of  blindly  complete  rela¬ 
tions  in  a  variety  of  classes  and  the  connections  between  blind  reductions  and  com¬ 
munication  complexity.  Of  potential  interest  is  a  study  of  the  higher  classes  of 
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complexity  (PSPACE,  EXP);  although  these  classes  can  hardly  be  deemed  con¬ 
structive  from  a  practical  standpoint  (any  relation  that  is  not  P-checkable  is  only 
“solvable”  in  some  abstract  sense),  the  greater  resources  which  they  make  available 
may  enable  us  to  derive  some  interesting  results  with  respect  to  reducibility. 
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We  have  previously  established  the  existence  of  decision  algonthms  with  ow-degree  poly¬ 
nomial  running  times  for  a  number  of  difficult  combinatorial  problems,  including  many  that 
can  be  stated  in  terms  of  VLSI  layout,  placement,  embedding,  and  routing.  In  this  paper,  we 
turn  our  attention  to  the  search  complexity  of  these  problems.  We  introduce  a  genera 
technique,  which  we  term  scaffolding ,  and  illustrate  how  it  is  useful  in  the  design  of  efficient 
search  algorithms. 

CAD  tools,  layout  algorithms,  search  complexity.  VLSI  design  paradigms 


1  INTRODUCTION 

Mathematical  tools  are  now  available  with  which  the  existence  of  asymptotically 
fast  decision  algorithms  can  be  proved  nonconstructively  [1-7]  (a  brief  exposition 
is  contained  in  the  Appendix).  In  a  recent  series  of  papers  [8-11],  we  have  employed 
these  new  tools  to  prove  low-degree  polynomial-time  decision  complexity  for  a 
variety  of  combinatorial  problems.  Included  on  this  list  are  a  number  of  fixed- 
parameter  layout  permutation  problems,  many  of  which  are  well  known  for  their 
relevance  to  VLSI  design.  Some  of  these  problems  were  not  previously  known  to 
be  decidable  in  polynomial  time  at  all;  others  were  known  to  be  decidable  only  in 
polynomial  time  by  way  of  brute  force  or  dynamic  programming  algonthms  with 
unboundedly  high-deeree  polynomial  running  times  (i.e.,  the  degree  of  the  poly 
nomial  is  an  unbounded  function  of  the  relevant  parameter). 

For  our  purposes,  it  is  important  to  note  that,  in  general,  the  algorithms  prove 
to  exist  with  these  new  tools  decide  only  whether  a  “yes”  or  a  “no  instance  has 
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been  presented-  That  is,  unlike  algorithms  devised  with  traditional  methods,  al¬ 
gorithms  based  on  these  tools  typically  rely  on  the  existence  of  finite  lists  of  neg¬ 
ative”  evidence  (obstruction  sets)  rather  than  on  attempts  to  find  more  natural 
“positive’'’  evidence  (satisfactory  layouts). 

In  this  paper,  we  explore  the  complexity  of  search  problems  (where  the  goal 
is  to  produce  positive  evidence  when  it  exists  [12])  that  correspond  to  the  decision 
problems  suggested  earlier.  Given  a  decision  problem  TI^  and  the  corresponding 
search  problem  II5,  any  method  that  isolates  a  solution  to  II5  by  repeated  calls  to 
an  algorithm  that  answers  UD  is  commonly  termed  a  self -reduction.  Although  we 
have  previously  identified  straightforward  self-reduction  methods  that  can  be  ap¬ 
plied  to  many  layout  permutation  problems  [13, 11],  we  herein  develop  considerably 
more  efficient  search  strategies.  To  accomplish  this,  we  introduce  a  general  tech¬ 
nique  that  we  call  scaffolding ,  and  show  how  it  can  be  the  basis  for  fast  search 
algorithms  for  a  number  of  layout  permutation  problems,  including  MIN  CUT 
LINEAR  ARRANGEMENT,  GATE  MATRIX  LAYOUT,  and  several  others. 
Because  0{n2)  time  decision  algorithms  exist  for  each  problem  we  consider  [10, 
11],  and  because  we  use  at  most  0(n  log  n)  calls  to  each  decision  algorithm,  the 
search  algorithms  we  devise  require  only  0(n 3  log  n )  time. 

In  the  next  section,  we  discuss  the  self-reduction  process  and  define  a  few 
useful  terms.  In  Section  3,  we  outline  the  main  features  of  scaffolding.  In  the  final 
section,  we  present  the  problem-specific  implementation  details  necessary  to  apply 
scaffolding  to  a  number  of  illustrative  problems. 


2  COMPLEXITY  OF  SEARCHING 

A  natural  computational  setting  in  which  to  consider  the  relationship  between 
search  and  decision  problem  complexities  is  that  of  oracle  computations,  where  an 
oracle  algorithm  A  to  solve  a  search  problem  has  access  to  some  decision  problem 
oracle  O.  More  formally,  an  oracle  algorithm  is  modeled  as  a  random  access 
machine  with  a  special  query  register  and  three  special  states,  squery ,  syesJ  and  sno. 
On  entering  the  state  squery,  the  machine  makes  a  transition  in  unit  time  either  to 
state  svfS  or  to  state  sno,  depending  on  whether  the  string  q.i,  where  q  denotes  the 
contents  of  the  query  register  and  i  denotes  the  input,  belongs  to  the  oracle  lan¬ 
guage. 

The  overhead  of  A  is  the  time  required  by  A  to  pinpoint  a  solution,  if  any 
exist,  where  each  invocation  of  O  is  charged  only  a  unit-time  cost.  Thus,  the 
overhead  of  A  is  the  time  required  by  A  outside  of  the  running  time  of  the  oracle. 
Since  our  goal  is  to  devise  0(n3  log  n)  oracle  algorithms,  it  is  necessary  to  ensure 
that  none  requires  more  than  0{n3  log  n)  overhead.  [In  fact,  none  will  need  more 
than  0(n2)  overhead.] 

For  an  example,  consider  the  vertex  permutation  problem  known  as  MIN  CUT 
LINEAR  ARRANGEMENT  [12].  In  this  JO -complete  problem,  we  are  given  a 
graph  G  and  a  positive  integer  k,  and  are  asked  whether  G  can  be  laid  out  with 
its  vertices  along  a  straight  line  so  that  no  orthogonal  cutting  plane  that  intersects 
the  line  between  any  two  consecutive  vertices  (but  not  on  a  vertex)  ever  cuts  more 
than  k  edges.  Until  recently,  the  fastest  known  algorithm  for  both  the  decision  and 
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the  search  versions  of  this  problem  was  a  dynamic  programming  formulation  with 
time  complexity  0(n*-1)  [14],  where  n  denotes  the  number  of  vertices  in  G.  Thus, 
MIN  CUT  LINEAR  ARRANGEMENT  is  in  9  for  any  fixed  value  of  k.  We  have 
shown  [11],  however,  that  the  asymptotic  time  complexity  can  be  reduced  to  0(n2), 
for  any  fixed  k.  This  can  be  used  to  obtain  an  0(n4)  search  strategy. 


Theorem  1  [13] 

For  any  fixed  k ,  a  satisfactory  solution  to  MIN  CUT  LINEAR  ARRANGEMENT 
can  be  constructed,  if  any  exist,  by  an  oracle  algorithm  [with  overhead  0(/i-)]  that 
makes  0(>r)  calls  to  an  0{n2)  decision  oracle  for  MIN  CUT  LINEAR  ARRANGE¬ 
MENT. 

If  as  in  Theorem  1,  the  oracle  language  consulted  by  the  oracle  algorithm 
solving  the  search  problem  is  precisely  the  set  of  “yes-’  instances  for  the  corre¬ 
sponding  decision  problem,  then  this  is  termed  a  self-reduction.  A  novelty  of  our 
approach  is  that  we  shall,  in  the  sequel,  obtain  efficient  oracle  algorithms  using 
oracle  languages  for  closely  related,  but  different  decision  problems,  with  no  in¬ 
crease  in  the  degree  of  the  polynomial  bounding  the  asymptotic  time  complexity 
of  the  decision  oracle. 

How  ®ood  is  Theorem  1?  More  specifically,  can  we  do  better  with  a  more 
sophisticated  self-reduction  strategy?  Clearly,  there  may  be  trade-offs  between  the 
amount  of  overhead  time  required,  the  number  of  oracle  calls  issued,  and  the 

power  (computational  cost)  of  the  oracle  used.  tnp  AR 

The  froal  of  the  next  two  sections  is  to  show  that,  for  MIN  CUT  LIN 
ARRANGEMENT  and  related  problems,  there  are  oracle  languages  that  are  both 
recognizable  in  quadratic  time  and  yet  powerful  enough  to  require  only  loS  n) 
calls  by  an  oracle  algorithm  [with  0{rr)  overhead].  This  naturally  yields  the  desired 

0(n 3  log  n)  time  search  algorithms.  .  . 

From  a  practical  standpoint,  these  algorithms  are  of  increasing  interest  as  the 
large  constants  and  nonconstructive  nature  of  the  0(n2)  oracles  are  reduced  or 
eliminated  [15-17].  From  a  more  purely  theoretical  perspective,  it  can  be  shown 
that  these  methods  are  “blind,”  needing  access  to  the  input  only  indirectly  through 
queries  to  the  oracle,  and  “fast,”  running  to  within  a  constant  factor  as  fast  as  any 
blind  strategy  can.  (In  fact,  if  one  is  willing  to  consider  nonblind  schemes,  even 

faster  self-reductions  are  sometimes  possible  [18,  15].)  . 

As  is  usual,  we  intend  that  n  be  reserved  to  denote  the  length  of  the  input. 
For  the  class  of  layout  permutation  problems  we  focus  on,  however  we  shall  abuse 
this  notation  slightlv,  using  n  to  denote  the  number  of  vertices  m  the  input  grap  . 
(This  causes  no  problem,  because  the  family  of  “yes”  instances  for  each  md.vidua 
problem  permits  only  a  linear  number  of  edges.  This  bound  follows  from 
originally  derived  in  [19].  See  [11,  6]  for  details.) 

3  SCAFFOLDING — AN  OVERVIEW 

We  now  describe  a  general  method  for  the  design  of  oracle  algonthms :  that  use 
oracle  languages  closely  related  to  the  relevant  decision  problems  and  are,  there 
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fore,  recognizable  in  low-degree  polynomial  time.  This  method  is  particularly  ap¬ 
plicable  to  layout  permutation  problems  concerning  width  metrics  on  graphs  and 
hypergraphs,  where  the  metric  is  defined  to  be  the  minimum,  over  all  permutations 
of  the  vertex  or  edge  set,  of  an  objective  function  defined  on  such  permutations. 
These  objective  functions  arise  in  a  number  of  ^-complete  problems,  including 
MIN  CUT  LINEAR  ARRANGEMENT,  MODIFIED  MIN  CUT  [20],  PATH 
WIDTH  [2],  GATE  MATRIX  LAYOUT  [21],  VERTEX  SEPARATION  [22], 
SEARCH  NUMBER  [23],  NODE  SEARCH  NUMBER  [24],  TWO-DIMEN¬ 
SIONAL  GRID  LOAD  FACTOR  [10],  and  others.  Each  of  these  is  known  to 
possess  an  0(n2)  time  decision  algorithm  for  any  fixed  width  value  [10,  11]. 

Our  strategy  yields  efficient  search  algorithms  for  problems  that  satisfy  the 
following  uniformity  condition  concerning  the  complexity  of  the  associated  fixed- 
parameter  decision  problem  with  the  width  metric  w:  there  is  a  constant  c  such 
that,  for  all  k,  it  can  be  decided  in  time  0{nc)  for  an  arbitrary  graph  G  whether 
u'(G)  is  at  most  k .  The  oracle  algorithms  we  shall  describe  for  width-&  layout 
problems  exploit  this  uniformity  condition  by  employing  oracle  languages  that  we 
can  show  are  efficiently  reducible  to  the  width-/:'  decision  problem,  for  an  appro¬ 
priately  chosen  k'  generally  greater  than  k .  (Of  course,  we  must  ensure  that  both 
the  oracle  queries  and  the  width-/:'  problem  instances  we  generate  in  this  manner 
have  size  linear  in  n.) 

Our  method  proceeds  in  two  stages.  First,  we  describe  a  convenient  oracle 
language  L  that  supports  an  efficient  oracle  algorithm,  solving  the  width-£  search 
problem  by  imitating  the  well-known  binary  insertion  sort  algorithm  [25].  Next  we 
show  that  L  can  be  recognized  in  low-degree  polynomial  time  by  reducing  the 
problem  of  recognizing  L  to  the  decision  problem  for  width-/:'  layout. 

The  reduction  consists  primarily  of  attaching  to  G  a  scaffolding  component  that 
encodes  the  data  structure  for  the  sorting  algorithm  (i.e.,  we  use  L  to  describe 
valid  intermediate  configurations  of  the  data  structure).  In  the  case  of  a  vertex 
permutation  problem,  for  example,  the  vertices  of  G  are  attached  one  by  one  to 
the  template  level  of  the  scaffold  as  a  permissible  sorted  order  of  the  attached 
vertices  is  progressively  extended.  [Such  a  permissible  order  on  a  subset  of  V(G) 
is  one  that  can  be  extended  to  a  permutation  of  V(G)  with  width  k  or  less.]  The 


template 


probe 


graph 


vertex  attachment 


Figure  1.  Overview  of  the  scaffolding  technique. 
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r  u  th<*  already  attached  vertices,  a  newly  chosen 

detTtad  vertex  maybe  inserted  in  a  permissible  order  is  made  by  means  of  the 

iTbetevd  of  he  scaffold.  Attachments  at  this  level  encode  a  determination  of 
probe  level  ot  tn  ,  ;n  anv  permissible  order  between  (not  necessarily 

ransccutive)  venicS"  and  w  that  are  already  attached  to  the  template.  The  overall 

idea  is  ,0  Je  the  scaffold  ,0  a  £ 

overview  of  this  general  construction  is  depicted  in  ri=u 
illustrative  problems  will  be  presented  in  the  next  section. 

4  ON  THE  IMPLEMENTATION  OF  SCAFFOLDING 
_  ,  .  f  . ,  j  fhp  details  of  the  scaffold  component  are  specific  to  the 

particular  layout  problem  TOnsider^d.  ^tt^hmentscan^take^the^Jorm^of^  edge 

W  CUT  LINEAR  ARRANGE- 

MENT. 


Theorem  2 

makes  0(n  log  n)  calls  to  an  0(n-)  decision  oracle. 

Proof.  For  convemence  we  shall 

^alayout^fT^raph1  ^^^e^east^lu^of  ^for^vWch3^  hasT  layout^MA 

concernins  our  encodinhs 

.  th  ^foHmvingldelrfslori  problenv^hich  v^u'cortespondlo'our  desired  o^ile 

language  L  in  a  way  that  we  shall  make  precise  shortly. 

lnp:Lt: 2 

Question:  Is  there  a  linear  order  on  V,  corresponding  to  a  layout  of  G  with  cutwidth 
at  most  /c,  that  extends  ^  such  that  x  ^  v  ^  w. 

of  V  -  r  by  binary  search  until  a  complete  kyo  y.  s> 

implemented  using  the  oracle  language  L  -  {q.G\q  - (V  , -.x,  l 

x,  v.  w)  is  a  “yes”  instance  to  the  de“‘°"  ^  describe  a  reduction  to  the  problem 

To  show  that  L  is  recognized  in  0(/r)  ume  w  ^  value  3k  soiely  because  it 

of  recognizing  graphs  of  cutwidth  at  mos  -  (  d  be  levels.  Clearly, 

is  an  obvious  candidate  when  augmenung  G  with  template  ana  p 
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larger  values  can  be  used  as  well.)  It  suffices  to  restrict  our  attention  to  input  q.G,q  - 
(Vr,  £,  x,  v,  w),  in  which  V'  U  {v}  meets  every  component  of  G.  (Disjoint  components 

can  be  laid  out  separately.)  ....  ..  . 

In  quadratic  time  we  will  construct  a  graph  G„  that  has  cutwidth  at  most  ofc  if,  a 
only  if,  q.G  G  L.  Before  presenting  the  details  of  this  construction,  we  use  Figure  -to 
depict  the  scaffolding  of  an  arbitrary  graph  as  it  might  appear  during  some  step  of  the 
algorithm,  with  k  =  2.  (We  shall  work  from  left  to  right,  ensunng  that  if,  at  any  step, 
the  binary  insertion  of  v  fails  for  all  of  V',  then  it  must  be  that  v  can  be  placed  to  the 

right  of  all  of  V'.)  .  r  i  F  •  _ 

Let  T  denote  the  template  level  of  Gr  Its  vertex  set  is  {tQt  tu  .  .  .  ,  bo 
1,  2,  .  ,  n  -  1,  a  set  of  k  (multiple)  edges  joins  t{  with  ti+l.  A  set  of  3k  edges  joins 

t0  to  tu  and  a  set  of  3k  edges  joins  tn  to  . 

Let  P  denote  the  probe  level  of  Gq.  The  vertex  set  of  P  is  \pl  y  p2i  •  -  •  » Psi  an(^ 

/  «  1,  2,  .  .  .  ,  4,  a  set  of  k  edges  joins  pt  to  pl+i.  The  graph  Gq  is  obtained  from  T,  r, 
and  G  by  the  following  sequence  of  vertex  identifications: 

The  vertices  tu  r2,  .  .  .  ,  r|v-,  are  identified  one-to-one  with  the  vertices  of  V  in  the 
order  specified  by 

The  vertex  tx  is  identified  with  the  vertex  pu  and  the  vertex  r,v,  is  identified  with 

the  vertex  p5 .  .  . ,  .f.  , 

The  vertex  p2  is  identified  with  the  vertex  and  the  vertex  p4  is  wentified  with  the 

vertex  w. 

4.  The  vertex  p3  is  identified  with  the  vertex  v. 

Note  that  since  each  component  of  G  meets  V’  U  {v},  the  resulting  graph  G  is 
connected.  It  remains  to  argue  that  Gq  has  cutwidth  at  most  3k  if,  and  only  i  ,  q. 

Suppose  that  <  can  be  extended  to  a  linear  order  on  V  that  witnesses  q.G  6  L. 
Let  ov  =  (r0T  tl9 .  .  .  ,  *„«.,)  be  the  sequence  of  vertices  of  T.  Let  o>  =  ( Pi,  Pn  *  •  •  *  Ps) 
be  the  sequence  of  vertices  of  P.  The  linear  order  <  on  7  can  be  represented  by  a 
sequence  of  vertices  c rG  =  (v„,  v/2,  .  *  .  ,  vj.  The  graphs  T  -  T  »  an 

each  have  cutwidth  at  most  k  with  layouts  described,  respectively,  by  the  sequences  o> 
=  (rlt  ,  /„),  o>,  and  oG.  It  follows  that  if  cr  represents  any  layout  for  Gq  that 

respects^*  o>,  and  vG  and  that  has  an  initial  subsequence  specified  by  (tQi  h)  and  a 
final  subsequence  specified  by  (tm,  f^),  then  a  describes  a  layout  of  width  of  at  most 
3k.  Since  the  given  order  on  V '  extends  to  the  order  represented  by  crG  m  a  way  that 
achieves  x  ^  v  ^  w,  we  may  choose  cr  so  that  the  vertices  identified  in  the  construction 


1. 


2. 


3. 


vertex  identification 

Figure  2.  Sample  scaffold  for  MIN  CUT  LINEAR  ARRANGEMENT. 
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of  G  from  T,  P,  and  G  appear  consecutively  in  <r,  and,  therefore,  the  cutwidth  is 
unaffected  by  these  identifications.  Thus,  G,  has  cutwidth  at  most  3k. 

Conversely,  suppose  Gq  has  cutwidth  bounded  by  ofc  as  witnessed  by  a  vertex  solution 
sequence  a.  Since  G,  is  connected,  and  since  there  are  3k  edges  between  /„  and  r,  and 
between  t  and  /„„„  a  must  have  its  initial  and  final  subsequences  specified  by  (r0,  f„ 
...  ,t„,  f"*,)  or  the  reverse  of  this,  which  we  can  ignore  without  loss  of  generality. 
Observe  that  the  restriction  of  a  to  the  vertex  set  of  T  must  equal  o>  (in  other  words, 
o  cannot  describe  a  layout  in  which  T  is  “kinked”),  else  the  cutwidth  of  the  layout 
described  by  cr  exceeds  3k,  which  is  impossible.  It  follows  that  the  restriction  of  cr  to  the 
vertices  of  V'  respects  the  given  order  according  to  our  construction  of  G„.  Similarly, 
the  restriction  of  a  to  the  vertex  set  of  P  must  equal  or,,  and,  therefore,  according  to  the 
identifications  made  in  the  construction  of  G„,  the  vertex  x  must  precede  the  vertex  v  m 
a  and  v  must  precede  tv.  Since  there  are  a  total  of  2k  edges  from  the  levels  T  and  P  to 
be  cut  between  any  two  positions  between  r,  and  r„  in  the  layout  described  by  cr,  and 
since  the  cutwidth  G,  is  bounded  by  3k,  it  follows  that  cr  describes  a  layout  of  G  with 
cutwidth  at  most  k.  Therefore  q.G  £  L. 


MIN  CUT  LINEAR  ARRANGEMENT  has  been  a  useful  example  of  a  vertex 
permutation  problem  amenable  to  this  approach.  We  now  turn  our  attention  to 
GATE  MATRIX  LAYOUT,  an  XSP-complete  problem  that  was  originally  posed 
in  terms  of  operations  on  Boolean  matrices.  Formally,  we  are  given  an  n  x  m 
Boolean  matrix  M  and  an  integer  k,  and  are  asked  whether  we  can  permute  the 
columns  of  M  so  that,  if  in  each  row  we  change  to  asterisks  every  zero  lying  between 
the  row’s  leftmost  and  rightmost  1,  then  no  column  contains  more  than  k  Is  and 
asterisks.  We  refer  the  interested  reader  to  [21]  for  sample  instances,  figures,  an 
additional  background  on  this  challenging  combinatorial  problem. 

We  have  shown  that  GATE  MATRIX  LAYOUT  is  linear-time  equivalent  to 
a  natural  edge  permutation  problem  on  graphs  [8],  and  we  have  used  this  to  show 
that  for  anv" fixed  k,  the  decision  problem  is  solvable  m  O(n-)  time.  The  equivalent 
edoe  permutation  problem  is  described  as  follows.  As  before,  n  denotes  the  number 
of  vertices  in  a  graph.  Let  Ev  denote  the  set  of  edges  incident  on  a  vertex  v.  For 
a  bijection/from  E  to  {1, 2, ....  |  £|},  the  span  of  v  under/is  the  set  of  consecutive 
integers  fi  i  -i-  I  ....  /}  for  which  i  is  the  least  element  in  /(£„)  and  ]  is  the 
greatest.  In  the  decision  problem  that  we  shall  henceforth  term  EDGE  PER¬ 
MUTATION  WIDTH,  we  are  given  a  graph  G  and  a  positive  integer  k,  and  a 
asked  whether  there  exists  such  a  bijection  with  a  property  that,  for  each  edge  e, 

/(e)  is  contained  in  the  spans  of  at  most  k  distinct  vertices.  h  -rlahilitv 

It  fa  interesting  that,  for  the  purpose  of  providing  polynomial-time  decidability , 
it  is  advantageous  to  reduce  GATE  MATRIX  LAYOUT  to  EDGE  PERMU¬ 
TATION  WIDTH  (as  described  in  [8])  and  workwith  graphs,  while,  for  the  purpose 
of  devisin'*  efficient' search  strategies,  it  is  more  useful  to  reduce  EDGE  PER- 
fvlTJTATION  WIDTH  to  GATE  MATRIX  LAYOUT  (as  we  shall  do  with  our 
<;raffoldin2  construction)  and  work  with  Boolean  matrices.  . 

We  S  to  Ze  our  results  in  terms  of  n,  the  number  of  vertices  m  the  input 
oraph  As  we  have  previously  pointed  out,  there  is  a  linear  bound  on  the  number 
S?dPisunct  edges  of’any  “yes”  instance,  so  that  n  is 

size  of  the  input.  (This  is  complicated  only  trivially  by  the  possible  existence  ot 
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loops  and  duplicate  edges,  which  have  no  effect  on  this  width  metric.  Therefore, 
we  make  the  assumption  that  the  input  is  restricted  to  simple  graphs.) 


Theorem  3 

For  any  fixed  k,  a  satisfactory  solution  to  EDGE  PERMUTATION  WIDTH  can 
be  constructed,  if  any  exist,  by  an  oracle  algorithm  [with  overhead  0(n-)\  that 
makes  0(n  log  n)  calls  to  an  0(n1 2 3)  decision  oracle. 

Proof.  For  convenience,  we  shall  henceforth  use  the  term  cost  to  denote  the  metric 
of  relevance  to  EDGE  PERMUTATION  WIDTH  and  GATE  MATRIX  LAYOUT. 
That  is,  the  cost  of  an  edge  permutation  of  a  graph  G  is  the  maximum  number  of  vertices 
whose  spans  contain  a  common  edge’s  image  under  the  permutation  mapping;  the  cost 
of  a  column  permutation  of  a  Boolean  matrix  M  is  the  maximum  number  Is  and  asterisks 
in  any  column. 

As  in  the  proof  of  Theorem  2,  we  first  define  an  oracle  language  L  that  supports  a 
satisfactory  oracle  algorithm  for  the  problem.  The  decision  problem  that  corresponds  to 
L  is  defined  as  follows. 


Input.  A  sixtuple  (G,  £',  e0,  ex,  e2),  where  G  -  ( V ,  £)  is  a  graph,  E  C  £,  ^ 

is  a  linear  ordering  of  £',  and  {e0,  elt  ej  is  a  set  of  distinct  elements  of  £  with 


{e0,  £2}  Q  E  .  . 

Question:  Is  there  a  linear  order  on  E  corresponding  to  an  edge  permutation  ot  O 
with  cost  at  most  k  that  extends  ^  such  that  e0  <  ex  <  e2l 


Let  L  =  {q.G\q  =  (£\  e0,  ex,  e2)  and  (G,  £',  e0,  eu  e2)  is  a  “yes”  instance 

to  the  decision  problem}.  There  is  a  straightforward  oracle  algorithm  with  oracle  language 
L  that  performs  a  binary  insertion  sort  to  construct  a  satisfactory  edge  permutation,  when 
any  exist,  by  making  0(n  log  n)  oracle  calls  and  requiring  0(n2)  overhead. 

To  show  that  L  can  be  recognized  in  0(n2)  time,  we  shall  describe  a  scaffolding 
reduction  to  GATE  MATRIX  LAYOUT,  which  is  decidable  in  0(n2)  time  for  each  fixed 
value  of  the  parameter.  To  accomplish  this,  we  construct  a  matrix  M{G.  q)  that  has  a 
cost  of  at  most  5k  if  and  only  if  q.  G  £  L.  (The  value  5k  is  chosen  solely  for  simplicity.) 

Before  presenting  the  details  of  this  reduction,  we  use  Figure  3  to  depict  the  scaf¬ 
folding  matrix  A*(G,  q)  corresponding  to  a  sample  string  q.G.  In  this  sample,  k  =  2,  q 
=  ({2?  3,  7},  (2,  3,  7),  3,  1,  7),  and  G  =  {V,  £)  with  V  =  {< a ,  by  c,  d ,  e }  and  £  =  {(a, 
b),  ( b ,  c),  (a,  c),  (a,  d ),  (c,  d ),  (a,  e),  (d.  e)}. 

The  vertices  of  G  are  represented  by  a  subset  of  the  rows  of  the  matrix,  and  the 
edges  of  G  are  represented  by  columns  of  the  matrix  containing  exactly  two  Is  in  that 
subset  (but  other  Is  elsewhere).  In  arguing  that  L  can  be  recognized  in  0(n -)  time,  it 
suffices  to  restrict  our  attention  to  input  q.G,  q  —  (£\  £q,  £2)1  f°r  which  £  U 

{ej  meets  every  component  of  G. 

To  specify  formally  the  construction  of  M(G,  q)  from  q.G ,  let  R  —  {r,|l  —  *  —  m 
denote  the  set  of  rows  of  A/(G,  q).  M{G,  q)  will  contain  |£|  =  0(n)  columns,  each 
denoted  by  a  subset  c  C  R,  with  the  understanding  that  c  is  the  set  of  rows  in  which  that 
column  contains  a  1  and  that  in  all  other  rows  that  column  contains  a  0.  The  exact  value 
of  the  upper  bound  t  used  in  the  indexing  of  R  is  implicit  in  the  construction  (and  is  not 
important,  except  that  it  must  be  a  linear  function  of  /j). 

The  template  level  T  of  M(G ,  q)  is  the  set  of  columns: 

1.  t<j  —  ^3^}. 

2.  For  i  —  1,  2,  .  .  .  ,  |  Er  j,  f,-  =  {*■(/+ *  *  *  » 

3.  r1£;-|  +  1  =  •  -  *  » ra£'i+5)J- 
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To  control  the  maximum  number  of  asterisks  that  can  be  introduced  in  any  column 
of'Af(G,  q),  our  reduction  also  has  a  ballast  level  B,  an  extension  of  the  template  an 
probe  levels.  First,  define  the  function  b:  E-+  {0,  1,  2}  by 


i>(e)  =  { 


2,  if  e  e  E  —  (£'  U  W) 

1,  if  e  e  (£'  -  {e0,  «:})  U  W 

0,  if  e  £  {e0.  e3- 


For  convenience,  we  ignore  the  row  indexing.  For  each  edge  e  €  E,  B  contains  a 
column  B  of  kble)  rows  disjoint  from  the  rows  used  in  the  columns  above  and  disjoint 
from  any  other  set  B,.  corresponding  to  an  edge  e'  e.  Thus,  each  row  of  B  contains 

a  sinsle  nonzero  entry*  „  f 

The  matrix  M(G,  q)  is  obtained  from  the  union  of  7\  Pt  H,  and  B  by  a  process  of 
identifying  certain  columns.  Two  columns  c,  and  c2  described  by  row  sets  are  identified 
bv  replacing  cx  and  c2  with  the  column  represented  by  the  union  c»  U  c2.  The  column 
identifications  we  require  are  as  follows: 

1.  Identify  r0  with  p5,  and  f,r|+l  with  p!-  .  L  „ 

2.  For  i  =  1,  2 _ _  |£'|,  identify  t,  with  Hel  where  e  is  the  ith  edge  m  the  linear 

ordering  ^  of  £'. 

3.  Identify  p2  with  HeZ,  and  p4  with  H«o- 

4.  Identify  p3  with  Hel. 

5.  For  each  edge  e  G  £,  identify  He  with  Be. 

Since  each  component  of  G  meets  Ef  U  {ej,  the  matrix  M(G,  q)  is  connected ,  in 
the  sense  that,  for  any  two  columns  c  and  c\  there  is  a  sequence  of  columns  c  -  c0,  ci, 

.  .  .  ,  c,  =  c'  such  that  e,  n  ci+1  *  0  for  i  =  0,  1 . r  -  1.  If  c  is  a  column  of  M 

(G  q)  that  is  the  result  of  identifying  columns  c,  and  c2,  we  may  refer  to  c  by  either  o 
the  designations  c,  or  c,.  This  if  often  convenient  and  should  cause  no  confusion. 

We  now  argue  that"  M(G,  q)  has  a  GATE  MATRIX  LAYOUT  cost  of  at  most  5k 
if  and  only  if  q.G&L.  Suppose  that  the  linear  order  £  on  E'  extends  to  a  linear  order 
on  E  that  witnesses  q .  G  £  L.  Let  <rr  =  (f„,  r„  .  .  .  ,  be  the  sequence  of  columns 

of  T,  and  let  o>  =  (p„ _ p.)  be  the  sequence  of  columns  of  P.  The  linear  ordering 

<  of  E  naturally  describes  a  sequence  of  the  columns  of  H  with  H,  occumng  before 
H,  in  <jh  if,  and  only  if,  e  £  e'.  The  order  s  similarly  describes  a  sequence  crs  of  the 

columns  of  B.  Let  T  =  T  -  {/„,  ♦ .}  and  let  crr  =  (/„  .  fin)- 

We  make  the  following  observation  concerning  our  construction  of  M(0,  q). 

(.)  The  sets  of  rows  occurring  in  the  columns  of  levels  T,  P,  H,  and  B  are  pairwise 
disjoint. 

Now  consider  a  sequence  o'  for  M(G,  q)  that  respects  or,  oP,  o„,  and  ofl.  Thus, 
o'  describes  a  layout  of  T,  P,  H,  and  B  (no  identifications).  The  subsequence  or  of  o 
describes  a  “stairstep”  layout  of  the  columns  of  T-  that  contributes  a  cost  of  2k  to  each 
column  of  T  and  a  cost  of  k  to  any  column  occurring  between  two  columns  of  T  in  o  . 
Similarly,  the  subsequence  of  o,  of  o'  describes  a  stairstep  layout  of  the  columns  of  P 
that  contributes  a  cost  of  2k  to  each  column  of  P  and  a  cost  of  k  to  any  column  occumng 
between  two  columns  of  P  in  o'.  The  subsequence  o„  of  o'  contributes  a  cost  of  at  most 
k  to  any  column,  and  the  subsequence  oe  similarly  contributes  a  cost  of  at  most  -k  to 
.  any  column.  By  observation  (.),  the  cost  incurred  by  any  column,  according  to  o  ,  is  the 
sum  of  the  costs  separately  contributed  by  or,  oP,  oH,  and  oB. 
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Suppose  a  is  a  sequence  for  M(G,  q)  that  respects  ar.  aP.  <x„.  and  a8  and  with  initial 

and  final  columns  specified  by  (f„ . t, Our  assumption  that  <  expends  to  an 

order  on  £(G)  that  witnesses  q.G  £  L  implies  that  we  may  choose  a  so  that  columns 
that  are  identified  in  the  construction  of  M(G,  q)  are  consecutively  adjacent  ma 

Consider  now  the  effects  of  these  identifications.  Apart  from  t0  and  1,^,  (each  of 
which  contains  5k  rows),  each  column  of  M(G,  q)  corresponds  to  a  unique  edge  e  of  G. 

So“  hb  column.  If «  e  (c.  «,).  .hen  o.  con.hbn.c  ,  cos.  o  0,  eon.hb.ms 
a  cost  of  at  most  k,  and  each  of  o.  and  a,  conthbu.es  a  cos.  of  2k.  If  c  is  in  (E  {«,. 
e))U  W  then  <rfl  contributes  a  cost  of  k ,  one  of  o>  and  o>  contributes  a  cost  of  k 
£  other contributes  Ik,  and  o„  contributes  a  cost  of  at  most  *.  If  «  ^»*'***  " 
£'  U  W  then  <xfl  contributes  a  cost  of  2k,  each  of  aT  and  crP  contributes  a  cost  of  k 
and  o-l  contributes  a  cost  of  at  most  k.  Thus,  each  column  of  M(G,  q)  under  or  has 

C°St  Converselv^uppose  there  is  a  sequence  a  of  the  columns  of  M(G,  q)  with  a  cost 
of  at  most  5k.  This  implies  that  (up  to  symmetry)  the  first  column  of  a  is  f0  and*®*“* 
is  t  since  M(G,  q)  is  connected.  It  also  implies  that  the  restriction  of  cr  to  the  columns 
of  T  musTbe  equal  to  ar  above,  since  (as  in  the  proof  of  Theorem  2)  any  tanking 
would  cause  the  width  to  be  greater  than  5k.  A  similar  rcmarkappl.es  to  the  «stncno 
„f  _  tn  the  Cniumns  of  P.  According  to  our  construction  of  M(G.  q)  usm=  column 
»l?“hT»L«  corresponding  ,o  c.  must  occur  before  ,hu »  - ■  »£ 
spondinv  to  e,  in  <r,  and  this  must  occur  before  the  column  corresponding  ;• 
argument  is  concluded  by  noting  that  the  subsequences  °f  <r  (by  restriction).  cr^o>,  a  d 
<t  tooether  contribute  a  cost  of  4 k  to  every  column  except  r0  and  fin*,.  Thus,  tne 
restriction  of  <r  to  the  columns  of  H  represents  i  permutation  with  a  cost  of  at  ™°st 
By  the  results  of  [8],  this  corresponds  directly  to  an  edge  permutation  demonstrat.il,  ttw 

q.Ge  L. 

Since  we  have  presented  useful  oracle  languages  and  their  associated  scaffolding 
implementations  in  considerable  detail  for  the  two  previous  problems— one  con¬ 
cerning  vertex  permutations  and  the  other  concerning  edge  permutations  we  shal 
merely  provide^brief  sketches  of  schemes  that  suffice  for  a  few  additional  il  cstrat  ve 
problems.  For  these  and  other  problems  amenable  to  this  approach,  we  the 
low-level  implementation  details  to  the  reader. 


Theorem  4 

TEX  SEPAI^nONfsEAROT^UMBEIc'bbd  TWO-DIMENSIONAL  GRID 

Proof  sketch.  The  last  problem  on  this  list  is  defined  and  the  complexity  of  its 

the  problem  of  recognizing  this  oracle  language  to  the  decision  problem  for  MODIFIED 
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MIN  CUT,  with  (fixed)  parameter  k*  -  3k  *F  2,  can  be  defined  by  modifying  the 
scaffolding  strategy  we  used  there  as  follows: 

1.  Delete  the  vertices  r0  and  /|V.!  +  1. 

2.  Add  four  new  vertices,  a ,  b ,  c,  d ,  and  a  set  of  3k  +  2  (multiple)  edges  joining  each 
of  the  pairs  of  vertices  (a,  b ),  (a,  f,),  (£>,  /:),  (^vj,  c),  (rin,  d ),  and  (c,  zf). 

3.  Add  additional  edges  so  that  for  i  =  1,  2,  .  .  .  ,  |  V'\  —  1  and  for  ;  =  1,  .  .  I  ,  4, 
there  is  a  set  of  /:  +  1  edges  joining  t-f  to  ri+1  and  pj  to  pJ+x* 

For  VERTEX  SEPARATION,  we  can  again  exploit  the  oracle  language  and  the 
scaffolding  strategy  used  in  the  proof  of  Theorem  2.  In  this  case,  we  choose  parameter 
k*  =  3 k  and  perform  the  following  modifications: 

1.  Subdivide  each  edge  once  between  t{  and  /i+1  for  i  =  0,  1,  ,  |  V'\. 

2.  Subdivide  each  edge  once  between  p-t  and  pi^l  for  i  -  1,  2,  -  .  .  ,  4. 

By  Theorem  2.2  of  [20],  the  search  problem  for  SEARCH  NUMBER  reduces  to 
the  search  problem  for  VERTEX  SEPARATION.  Thus,  an  oracle  algorithm  for  SEARCH 
NUMBER  may  be  described  easily. 

For  the  (fixed)  parameter  values  k  and  d ,  an  oracle  algorithm  for  TWO-DIMEN¬ 
SIONAL  GRID  LOAD  FACTOR  can  be  based  on  an  oracle  language  reflecting  the 
“yes”  instances  of  the  following  decision  problem. 

Input :  A  sixtuple  (G,  /,  d\  n0,  nuv),  where  G  =  (V,  E)  is  a  graph  of  order  n,f  is 
a  partial  one-to-one  map  from  V{G)  to  {1,  2,  .  .  .  ,  d\  x  {1,  2,  .  .  .  ,  /z},  1  <  d' 
<  1  <  /t0  <  /z,  <  /z,  and  vGK 

Question :  Is  there  a  one-to-one  map  F  extending  /  to  all  of  V  that  represents  an 
embedding  of  G  in  the  d  x  n  rectangular  grid  with  a  load  factor  of  at  most  k  and 
for  which  F(v)  =  (/,  j)  with  i  =  d!  and  nQ  <  j  <  n^. 

A  scaffolding  construction  that  can  be  used  to  show  that  this  problem  is  decidable 
in  0(nz)  time  is  sketched  as  follows,  using  parameters  k'  -  3 k  and  d‘  =  d  +  2. 

The  template  level  consists  of  a  modified  (d  +  2)  x  n  rectangular  grid.  Each 
peripheral  edge  is  replaced  by  a  set  of  3k  edges  and  every  other  edge  is  replaced  by  a 
set  of  2k  edges,  except  for  the  edges  in  rows  between  the  column  indices  /z0  and  /z,,  which 
are  replaced  by  a  set  of  k  edges  each.  The  probe  level  consists  of  three  vertices,  zz0,  u, 
and  Wj,  with  k  edges  between  uQ  and  zz,  and  between  u  and  Wj.  The  vertex  u0  is  identified 
with  the  grid  vertex  at  coordinates  (d\  «0),  and  the  vertex  u,  is  identified  with  the  grid 
vertex  at  coordinates  (d\  /zt).  The  vertex  u  is  identified  with  the  vertex  v.  The  vertices 
of  G  are  identified  with  the  template  vertices  in  a  way  that  reflect  the  map  f.  □ 


APPENDIX:  NONCONSTRUCTIVE  TOOLS 
AND  THEIR  APPLICATION 

A  subdivision  of  a  graph  H  is  any  graph  obtained  from  H  by  replacing  edges  with 
paths.  Alternatively,  one  may  view  this  as  the  insertion  of  some  number  of  vertices 
of  degree  2  into  the  edges  of  H.  An  example  is  illustrated  in  Figure  Al. 

A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  topological  order,  written 
H  ^ t  G,  if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of  these  two  operations:  taking  a  subgraph  and  contracting  an  edge  at  least  one  of 
whose  endpoints  has  degree  2.  A  famous  result  of  Kuratowski  [26]  states  that  a 
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E 


Figure  Al. 


various 
A  graph  and  some  of  its 


subdivisions  of  H 
subdivisions. 


-  contract 

Figure  A2.  Construction  demonstrating  that  W4  is  a  minor  of  Qj* 


graph  G  is  nonplanar  if,  and  only  if,  K ,  s,  G  or  K3. ,  s,  G.  The  results  we  survey 
here  can  be  viewed  as  a  vast  generalization  of  Kuratowski  s  Theorem. 

A  oraph  H  is  less  than  or  equal  to  a  graph  C  in  the  minor  order,  written  W 
<  G  °if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of" these  two  operations:  taking  a  subgraph  and  contracting  an  arbitrary  edge.  For 
example,  the  construction  depicted  in  Figure  A2  shows  that  W4  — m  Qi  (a  1  ou§ 

Wj  A  Smily  Fof  graphs  is  said  to  be  closed  under  the  minor  ordering  if  the  facts 
that  G  is  in  F  and  that  H  <mG  together  imply  that  H  must  be  in  F.  The  otorucnon 
sec  for  a  familv  F  of  graphs  is  the  set  of  graphs  in  the  complement  of  F  that  are 
minimal  in  the  minor  ordering.  Therefore,  if  F  is  closed  under  the  minor :  ordering, 
it  has  the  following  characterization:  GisinF  if,  and  only  if,  there  exists  no  H 
the  obstruction  set  for  F  such  that  H  — m  G. 


Theorem  Al  [7] 

Any  set  of  finite  graphs  contains  only  a  finite  number  of  minor-minimal  elements. 


Theorem  A 2  [6] 

For  every  fixed  graph  H,  the  problem  that  takes  as  input  a  graph  G  and  determines 
whether  H^mG  is  solvable  in  polynomial  time. 
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- lift 

Figure  A3.  Construction  demonstrating  that  C4  is  immersed  in  K1  +  2 Kz. 


Theorems  A1  and  A2  guarantee  only  the  existence  of  a  polynomial-time  de¬ 
cision  algorithm  for  any  minor-closed  family  of  graphs.  In  particular,  no  proof  of 
Theorem  A1  can  be  entirely  constructive  [15].  Nevertheless,  obstruction  sets  can 
often  be  isolated  with  problem-specific  methods  [16]. 

Letting  n  denote  the  number  of  vertices  in  G,  the  general  time  bound  for 
algorithms  ensured  by  these  theorems  if  0(/i3).  If  F  excludes  a  planar  graph,  then 
the  bound  is  0(n2).  Much  recent  progress  has  been  made  to  mitigate  the  enormous 
constants  of  proportionality  first  reported  for  such  algorithms  in  [4].  New  techniques 
greatly  reduce  the  constants  in  general  [17],  and  techniques  specific  to  layout 
problems  such  as  those  we  have  addressed  here  lower  them  much  more  dramatically 

[15]- 

A  graph  H  is  less  than  or  equal  to  a  graph  G  in  the  immersion  order,  written 
H  G,  if  and  only  if  a  graph  isomorphic  to  H  can  be  obtained  from  G  by  a  series 
of  these  two  operations:  taking  a  subgraph  and  lifting  pairs  of  adjacent  edges.  For 
example,  the  construction  depicted  in  Figure  A3  shows  that  C4  ^  Kx  +  2 K-> 
(although  C4  Kx  -f  2K2  and  C4  Kx  +  2 K2). 

The  relation  like  <m,  defines  a  partial  ordering  on  graphs  with  the  associated 
notions  of  closure  and  obstruction  sets. 

Theorem  A3  [3] 

Any  set  of  finite  graphs  contains  only  a  finite  number  of  immersion-minimal  ele¬ 
ments. 

Theorem  A4  [11] 

For  every  fixed  graph  H ,  the  problem  that  takes  as  input  a  graph  G.and  determines 
whether  H  G  is  solvable  in  polynomial  time. 

Theorems  A3  and  A4  also  guarantee  only  the  existence  of  a  polynomial-time 
decision  algorithm  for  any  immersion-closed  family  of  graphs.  Our  proof  of  Theo¬ 
rem  A4  yields  a  general  time  bound  of  G(/iA+3),  where  h  denotes  the  order  of  the 
largest  graph  in  the  relevant  obstruction  set.  With  knowledge  of  specific  minors 
excluded  by  an  immersion-closed  family  F ,  however,  the  time  complexity  for  de¬ 
termining  membership  can,  in  many  cases,  be  reduced  to  G(n2)  [11]  by  bounding 
the  tree-width  [5]  of  F. 
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For  an  application  of  Theorems  A1  and  A2,  consider  GATE  MATRIX  LAY¬ 
OUT,  a  combinatorial  problem  arising  in  several  VLSI  layout  styles,  including  gate 
matrix,  programmable  logic  arrays  under  multiple  folding,  Weinberger  arrays,  and 
others.  Although  the  general  problem  is  XSP-compIete,  we  have  shown  that,  for 
any  fixed  value  of  k,  an  arbitrary  instance  can  be  mapped  to  an  equivalent  instance 
with  only  two  Is  per  column,  then  modeled  as  a  graph  on  n  vertices  such  that  the 
family  of  “yes”  instances  is  closed  under  the  minor  order  and  excludes  a  planar 
graph. 

Theorem  A5  [8] 

For  any  fixed  k,  GATE  MATRIX  LAYOUT  can  be  decided  in  0(n2)  time. 

For  an  application  of  Theorems  A3  and  A4,  consider  TWO-DIMENSIONAL 
GRID  LOAD  FACTOR,  a  problem  that  is  a  two-dimensional  analog  of  MIN  CUT 
LINEAR  ARRANGEMENT  [12].  The  minimum  load  factor  of  G  relative  to  C  is 
the  minimum,  over  all  embeddings  of  G  in  C,  of  the  maximum  number  of  paths 
in  the  embedding  that  share  a  common  edge  in  C.  In  the  TWO-DIMENSIONAL 
GRID  LOAD  FACTOR  problem,  we  are  given  a  graph  G  and  integers  k  and  iv, 
and  are  asked  whether  the  minimum  load  factor  relative  to  an  infinite-length,  width- 
w  grid  is  less  than  or  equal  to  k. 

Although  the  general  problem  is  XSP-complete  even  when  w  is  fixed,  we  have 
shown  that  when  both  k  and  w  are  fixed,  the  family  of  “yes”  instances  is  closed 
under  the  immersion  order  and  has  bounded  tree-width. 

Theorem  A 6  [11] 

For  any  fixed  k  and  w,  TWO-DIMENSIONAL  GRID  LOAD  FACTOR  can  be 
decided  in  0{n2)  time. 
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Abstract — A  parallel  algorithm  is  time-space  optimal  if  it 
achieves  optimal  speedup  and  if  it  uses  only  a  constant  amount 
of  extra  space  per  processor  even  when  the  number  of  proces¬ 
sors  is  fixed.  Previously  published  parallel  merging  and  sorting 
algorithms  fail  to  meet  at  least  one  of  these  criteria.  In  this 
paper,  we  present  a  parallel  merging  algorithm  that,  on  an  EREW 
PRAM  with  k  processors,  merges  two  sorted  lists  of  total  length 
n  in  0(n/k  +  log  n)  time  and  0(k )  extra  space,  and  is  thus 
time-space  optimal  for  any  value  of  k  <  n/(  log  n).  We  also 
describe  a  stable  version  of  our  parallel  merging  algorithm  that 
is  similarly  time- space  optimal  on  an  EREW  PRAM.  These  two 
parallel  merges  naturally  lead  to  time -space  optimal  parallel 
sorting  algorithms. 

Index  Terms —  Block  rearranging,  internal  buffering,  mem¬ 
ory  management,  merging  and  sorting,  parallel  computation, 
time -space  optimality. 

I.  Introduction 

THE  quest  for  efficient  parallel  merging  and  sorting  algo¬ 
rithms  has  been  a  long-standing  topic  of  intense  interest, 
as  evidenced  by  the  impressive  volume  of  literature  published 
on  this  subject  (see,  for  example,  [1],  [6],  [17]  for  recent 
surveys).  Much  of  the  focus  has  been  on  the  search  for  methods 
that  are  optimal  in  the  classic  sense  that  asymptotically  optimal 
speedup  is  attained.1  Indeed,  a  number  of  parallel  algorithms 
have  been  proposed  that  are  optimal  under  this  criterion, 
including  those  found  in  [3],  [7]-[9],  [15],  [20],  and  [22]. 

Curiously,  and  quite  unlike  the  case  for  sequential  al¬ 
gorithms,  very  little  attention  seems  to  have  been  paid  to 
space  management  issues.  Some  of  this  phenomenon  can 
perhaps  be  attributed  to  the  fact  that  much  of  what  is  known 
about  parallel  algorithms  is  relatively  new.  Accordingly,  less 
time  has  elapsed  for  practical  problems  of  implementation 
to  become  widely  known.  (See,  for  example,  the  formidable 
difficulties  in  memory  management  that  have  recently  been 
encountered  when  an  attempt  has  been  made  to  implement 
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lA  parallel  method  attains  asymptotically  optimal  speedup  if  the  product 
of  the  number  of  processors  it  employs  and  the  amount  of  time  it  takes  is 
within  a  constant  factor  of  the  time  required  by  a  fastest  sequential  algorithm. 


AfC-style  algorithms2  on  hypercubes  with  16  and  256  nodes 
[5].)  Another  contributing  factor  may  be  that  memory  has 
become  so  inexpensive  during  the  last  few  years  that  it  is 
often  easy  simply  to  ignore  it.  In  any  event,  space  utilization 
continues  to  be  a  critical  aspect  in  many  applications,  even 
for  sequential  processing;  this  criticality  is  only  heightened  in 
parallel  processing  systems  when  the  number  of  processors  is 
bounded. 

None  of  the  previously  published  parallel  merging  and  sort¬ 
ing  strategies  are  time -space  optimal  That  is,  none  achieve 
optimal  speedup  and,  at  the  same  time,  require  only  a  constant 
amount  of  extra  space  per  processor  even  when  the  number  of 
processors  is  fixed.  We  remark  that,  from  a  consideration  of 
time  alone,  these  algorithms  represent  an  acceptable  approach, 
mirroring  one  reason  for  the  popularity  of  the  parallel  random- 
access  machine  (PRAM)  model.  Specifically,  if  the  number  of 
processors  is  fixed,  then  as  the  problem  size  grows,  an  AfC 
algorithm  can  be  “scaled  down,”  so  that  each  real  processor 
needs  merely  to  emulate  multiple  virtual  processors,  thereby 
accounting  for  the  massive  parallelism  inherent  in  the  design 
of  the  algorithm.  Unfortunately,  however,  space  requirements 
in  this  scenario  tend  to  “blow  up,”  unless  the  extra  space 
required  by  each  real  processor  is  constant,  independent  of 
the  growing  problem  size.  Added  cause  for  concern  is  that, 
even  if  enough  global  memory  is  available,  the  more  shared 
memory  accesses  a  program  makes  the  more  message  traffic 
is  placed  on  whatever  interconnection  network  is  used  to 
realize  the  shared  memory,  with  an  attendant  downgrading 
of  the  overall  system’s  performance.  Incorporating  secondary 
memory  devices  into  this  picture  naturally  leads  to  additional 
problems  [18],  to  be  avoided  as  long  as  main  memory  need 
not  be  squandered  on  temporary  extra  storage. 

From  the  foregoing  discussion,  we  conclude  that  any  gen¬ 
uine  attempt  to  minimize  extra  space  dictates  that  the  total 
number  of  extra  storage  cells  required  by  each  processor  be 
constant,  even  when  the  number  of  processors  available  is 
bounded  by  some  constant  k  whose  value  is  independent  from 
the  size  of  a  problem  instance,  n.  (This  is  in  contrast  to  work 
such  as  that  described  in  [19],  in  which  constant  extra  space 
is  employed  at  each  processor,  but  the  number  of  processors 
is  assumed  to  be  @(n2).)  Moreover,  as  an  attractive  side  effect 
of  attempting  to  minimize  extra  space,  a  bounded  number  of 
processors  reflects  more  faithfully  any  real  parallel  computing 
environment. 

In  this  paper,  we  present  for  the  first  time  a  parallel  merging 
algorithm  that,  on  an  exclusive-read  exclusive-write  (EREW) 

2  A  problem  is  said  to  be  in  AfC  if  it  possesses  a  parallel  algorithm  that, 
for  any  problem  instance  of  size  n,  employs  a  number  of  processors  bounded 
by  some  polynomial  function  of  n  and  requires  an  amount  of  time  bounded 
by  some  polylogarithmic  function  of  n. 
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PRAM,  merges  two  sorted  lists  in  0(n/k  -blogn)  time  and 
constant  extra  space  per  processor,  and  hence  is  time-space 
optimal  for  any  value  of  k  <  n/(logn.)  We  also  describe 
how  this  gives  rise  to  a  stable3  version  of  our  parallel  merging 
algorithm  that  is  similarly  time -space  optimal  on  an  EREW 
PRAM.  We  observe  that  (our  technique  for  achieving)  stability 
incurs  two  penalties:  a  slightly  more  complicated  algorithm 
and  somewhat  larger  constants  of  proportionality.  These  two 
parallel  merges  naturally  lead  to  time-space  optimal  parallel 
sorting  algorithms. 

In  the  next  section,  we  briefly  review  the  main  features 
of  a  recently -published  linear-time  in-place  sequential  merge. 
Although  a  direct  parallelization  of  this  method  is  not  possible, 
its  overall  structure  is  helpful  in  simplifying  the  presentation 
of  our  parallel  algorithm,  which  we  describe  in  detail  in 
Section  III.  As  we  demonstrate  in  that  section,  a  major  factor 
in  our  algorithm’s  asymptotic  time-space  optimality  is  the 
introduction  of  a  useful  technique  that  is  based  on  what  we  dub 
a  displacement  table.  We  next  move  on  to  the  subject  of  stable 
merging,  describing  in  Section  IV  how  some  relatively  simple 
modifications  to  our  parallel  merge  can  be  exploited  to  yield 
a  stable  time -space  optimal  parallel  algorithm.  Extensions  to 
sorting  and  open  topics  for  future  research  are  discussed  in 
the  final  section. 


II.  A  Review  of  Time-Space 
Optimal  Sequential  Merging 

To  simplify  the  presentation  of  our  time -space  optimal 
parallel  algorithm  in  the  next  section,  it  is  useful  first  to  review 
at  least  briefly  the  recently-published  and  relatively  simple 
linear-time,  in-place  sequential  merge  from  [10].  Expectedly, 
some  of  the  operations  that  are  easy  to  perform  sequentially  are 
difficult  to  perform  in  parallel.  Interestingly,  on  the  other  hand, 
some  of  the  operations  that  are  difficult  to  perform  sequentially 
are  easy  to  perform  in  parallel.  On  the  whole,  however,  it 
turns  out  that  a  direct  parallelization  of  this  novel  sequential 
method  is  not  possible.  Nevertheless,  its  overall  structure  can 
be  used  to  guide  our  thinking  so  that,  with  the  aid  of  our 
parallel  displacement  table  technique  to  be  presented  later,  we 
can  direct  all  available  processors  to  work  efficiently  in  unison. 

The  (sequential)  optimality  attained  with  respect  to  both 
time  and  space  inherently  relies  on  the  related  notions  of  block 
rearranging  and  internal  buffering.  To  get  a  feel  for  the  general 
way  in  which  such  a  strategy  works,  it  is  helpful  to  view  a  list 
containing  n  records  as  a  collection  of  S(y/n)  blocks,  each 
of  size  ©(^/ra).  This  approach  allows  us  to  employ  one  block 
as  the  (internal)  buffer  to  aid  in  resequencing  the  other  blocks 
of  the  two  sorted  sublists  and  then  merging  these  blocks  into 
one  sorted  list.  Since  only  the  contents  of  the  buffer  and  the 
relative  order  of  the  blocks  need  ever  be  out  of  sequence, 
linear  time  is  sufficient  to  achieve  order  by  straight-selection 
sorting  [14]  both  the  buffer  and  the  blocks  (each  sort  involves 
0(y/n)  keys).  We  refer  the  interested  reader  to  [10]— [13]  for 

3  A  merging  algorithm  is  stable  if  it  preserves  the  original  relative  order  of 
records  with  identical  keys. 


extensive  background,  related  results,  and  additional  details  on 
block  rearranging  and  internal  buffering  methods. 

We  note  that,  for  the  sake  of  complete  generality,  we  allow 
neither  the  key  nor  any  other  part  of  a  record  to  be  modified  by 
our  algorithms.  Such  is  necessary,  for  example,  when  records 
are  write-protected  or  when  there  is  no  explicit  key  field  within 
each  record,  but  instead  a  record’s  key  is  a  function  of  one  or 
more  of  its  data  fields. 

LetZ,  denote  a  list  containing  two  sublists  to  be  merged,  each 
with  its  keys  in  nondecreasing  order.  We  shall  make  a  few 
simplifying  assumptions  about  L  to  facilitate  the  discussion. 
(See  [10]  for  a  complete  exposition  of  the  algorithm,  an 
example,  and  the  0(y/n)  time  and  0(1)  space  implementation 
details  necessary  for  handling  arbitrary  inputs.) 

We  assume  that  n  is  a  perfect  square,  and  that  the  records 
of  L  have  already  been  permuted  so  that  yfn  largest-keyed 
records  are  at  the  front  of  the  list  (their  relative  order  there  is 
immaterial),  followed  by  the  remainders  of  the  two  sublists, 
each  of  which  we  now  assume  contains  an  integral  multiple  of 
y/n  records  in  nondecreasing  order.  Therefore,  we  can  view  L 
as  a  series  of  yfn  blocks,  each  of  size  yfn.  The  leading  block 
will  be  used  as  an  internal  buffer  to  aid  in  the  merge.  , 

The  first  step  is  to  sort  the  yfn  —  1  rightmost  blocks 
by  their  tails  (rightmost  elements),  after  which  their  tails 
form  a  nondecreasing  key  sequence.  (In  this  setting,  selection 
sort  requires  only  0(ri)  key  comparisons  and  record  ex¬ 
changes.)  Records  within  a  block  retain  their  original  relative 
order. 

The  second  step,  which  is  the  most  complex,  is  to  direct  a 
sequence  of  series  merges.  An  initial  pair  of  series  of  records  to 
be  merged  is  located  as  follows.  The  first  series  begins  with  the 
head  of  block  2  and  terminates  with  the  tail  of  block  i ,  i  >  2, 
where  block  i  is  the  first  block  such  that  the  key  of  the  tail 
of  block  i  exceeds  the  key  of  the  head  of  block  i  +  1.  The 
second  series  consists  solely  of  the  records  of  block  i  +  1.  The 
buffer  is  used  to  merge  these  two  series.  That  is,  the  leftmost 
unmerged  record  in  the  first  series  is  repeatedly  compared  to 
the  leftmost  unmerged  record  in  the  second,  with  the  smaller- 
keyed  record  swapped  with  the  leftmost  buffer  element.  Ties 
are  broken  in  favor  of  the  leftmost  series.  (In  general,  the  buffer 
may  be  broken  into  two  pieces  as  the  merge  progresses.)  This 
task  is  halted  when  the  tail  of  block  i  has  been  moved  to  its 
final  position. 

The  next  two  series  of  records  to  be  merged  are  now  located. 
This  time,  the  first  begins  with  the  leftmost  unmerged  record 
of  block  i  +  1  and  terminates  as  before  for  some  j  >  i.  The 
second  consists  solely  of  the  records  of  block  j  +  1.  The 
merge  is  resumed  until  the  tail  of  block  j  has  been  moved. 
This  process  of  locating  series  of  records  and  merging  them 
is  continued  until  a  point  is  reached  at  which  only  one  such 
series  exists,  which  is  merely  shifted  left,  leaving  the  buffer 
in  the  last  block. 

The  final  step  is  to  sort  the  buffer,  thereby  completing  the 
merge  of  L. 

0(n)  time  suffices  for  this  entire  procedure,  because  each 
step  requires  at  most  linear  time.  0(1)  space  suffices  as  well, 
since  the  buffer  was  internal  to  the  list,  and  since  only  a 
handful  of  additional  pointers  and  counters  are  necessary. 
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III.  Time-Space  Optimal  Parallel 
Merging  on  the  EREW  PRAM  Model 

Note  that,  exclusive  of  implementation  details  for  extracting 
the  internal  buffer  and  for  handling  lists  and  sublists  of  arbi¬ 
trary  sizes,  the  sequential  algorithm  just  described  comprises 
three  steps:  block  sorting ,  series  merging ,  and  buffer  sorting. 
Unfortunately,  these  steps  do  not  appear  to  permit  a  direct 
parallelization,  at  least  not  one  that  requires  only  constant 
extra  space  per  processor.  In  particular,  the  internal  buffer 
is  instrumental  in  the  series  merging  step,  dictating  a  block 
size  of  B(y/n)  that  in  turn  severely  limits  what  can  be 
accomplished  efficiently  in  parallel. 

Optimistically,  however,  observe  that  if  we  could  only 
devise  a  time -space  optimal  method  to 

1)  use  bigger  blocks  (namely,  one  block  for  each  of  the  k 
processors,  giving  rise  to  a  block  size  of  n/k ,  a  value 
that  might  be  unboundedly  greater  than  y/n)  and 

2)  reorganize  the  file  so  that  the  problem  is  reduced  to 
one  of  k  local  merges  (that  is,  replace  the  contents  of 
each  block  with  two  sublists,  one  from  each  of  the  two 
original  sublists  in  L,  so  that  the  largest  key  in  block  i 
is  no  greater  than  the  smallest  key  in  block  i  +  1,  for 
1  <  i  <  fc), 

then  we  could  complete  a  time-space  optimal  merge  of  L  by 
simply  directing  each  processor  to  merge  the  contents  of  its 
own  block  using  the  algorithm  sketched  in  the  last  section. 
This  observation  is  the  genesis  of  the  parallel  method  we  shall 
now  present. 

Ignoring  for  the  moment  implementation  details  for 
dealing  with  lists  and  sublists  of  arbitrary  sizes  (these 
details  will  be  addressed  at  the  end  of  this  section), 
our  parallel  method  comprises  these  five  steps:  block 
sorting ,  series  delimiting,  displacement  computing,  series 
splitting,  and  local  merging.  Since  the  last  step  of  our 
algorithm  (local  merging)  is  easy  from  a  parallel  standpoint, 
it  is  not  surprising  that  the  earlier  steps  are  relatively 
complicated,  requiring  a  careful  coordination  of  all  processors 
to  achieve  efficiently  the  desired  reorganization  of  the 
file. 

To  facilitate  discussion,  let  us  temporarily  assume  that  the 
number  of  records  in  each  of  the  two  sublists  in  L  is  evenly 
divisible  by  k.  We  shall  refer  to  a  record  or  block  from  the 
first  sublist  of  L  as  an  LI  record  or  an  LI  block.  We  shall  use 
the  terms  L2  record  and  L2  block  in  an  analogous  fashion  for 
elements  from  the  second  sublist. 

Block  Sorting:  We  first  view  L  as  a  sequence  of  k  blocks, 
each  of  size  n/k.  See  Fig.  1,  in  which  we  employ  a  handy 
pictorial  representation  forL,  using  the  vertical  axis  to  indicate 
increasing  key  values  and  the  horizontal  axis  to  indicate 
increasing  record  indexes.  We  seek  to  sort  these  blocks  by  their 
tails.  This  is  a  relatively  simple  chore  if  one  is  willing  to  settle 
for  a  concurrent-read  exclusive-write  (CREW)  algorithm.  For 
example,  we  could  begin  by  directing  each  LI  (L2)  processor 
to  perform  a  binary  search  on  the  L2  (LI)  sublist,  comparing  its 
block’s  tail  against  the  tails  in  that  sublist.  In  order  to  sort  the 
blocks  efficiently  on  the  EREW  model,  we  adopt  a  slightly 
more  complex  strategy. 


LX  L2 

Fig.  1.  Divide  lists  into  blocks. 


LX  LX  LX  L2  LX  L2  12  L2  LX  L2 
Fig.  2.  Block  sorting. 


We  first  direct  each  processor  to  set  aside  a  copy  of  the 
tail  of  its  block  and  its  index  (an  integer  between  1  and  k, 
inclusive).  We  can  now  merge  the  k  tail  copies  (dragging  along 
the  indexes)  by  reversing  in  parallel  the  copies  from  the  second 
sublist  and  then  invoking  the  well-known  bitonic  merge  [4],  a 
task  requiring  O(logfc)  time  and  0(k)  total  extra  space. 

After  this  merge  is  completed,  each  processor  knows  the 
index  of  the  block  it  is  to  receive.  With  the  use  of  but  one 
extra  storage  cell  per  processor,  it  is  now  a  simple  matter  for 
the  processors  to  acquire  their  respective  new  blocks  in  parallel 
without  memory  conflicts,  one  record  at  a  time  (say,  from  the 
first  record  in  a  block  to  the  last).  This  task  requires  0(n/k) 
time  and  0(k)  extra  space. 

This  completes  the  block  sorting  step,  and  has  required 
0(n/k  +  log  k)  time  and  constant  extra  space  per  processor. 
See  Fig.  2. 

Series  Delimiting:  As  with  the  sequential  method,  it  is 
helpful  at  this  point  to  think  of  the  list  as  containing  a 
collection  of  pairs  of  series  of  records,  with  each  pair  of  series 
to  be  merged.  (Of  course,  we  cannot  now  merely  mimic  the 
sequential  series  merging  step.  If  there  are  large  series,  then  it 
would  take  too  long  to  merge  them;  if  there  are  large  blocks, 
then  it  would  take  too  long  to  sort  any  type  of  internal  buffer.) 
We  require  a  somewhat  more  refined  definition  of  “series,” 
however,  because  we  must  insist  that  pairs  of  series  do  not 
overlap  one  another.  The  first  and  second  series  of  any  given 
pair  meet  as  before,  where  the  tail  of  block  i  exceeds  the  head 
of  block  i  - l-l.  To  determine  where  pairs  meet  each  other, 
we  now  use  the  term  “breaker”  to  denote  the  first  record  of 
block  i  +  1  that  is  no  smaller  than  the  tail  of  block  i.  Thus, 
the  first  series  of  a  pair  needs  only  to  begin  with  a  breaker, 
and  the  second  series  of  that  pair  needs  only  to  end  with  the 
record  immediately  preceding  the  next  breaker.  This  notion  is 
illustrated  in  Fig.  3.  Because  each  pair  of  series  is  made  up 
either  of  a  portion  of  an  LI  block  followed  by  zero  or  more 
full  LI  blocks  and  a  portion  of  an  L2  block,  or  a  portion  of 
an  L2  block  followed  by  zero  or  more  full  L2  blocks  and  a 
portion  of  an  LI  block,  and  because  these  two  configurations 
are  symmetric,  we  shall  henceforth  address  only  the  former 
case  in  this  and  subsequent  figures. 

For  a  processor  to  determine  whether  its  block  contains  a 
second  series,  it  simply  compares  its  head  to  its  left  neighbor’s 
tail.  If  this  comparison  reveals  that  the  processor  does  contain 
such  a  series,  then  it  invokes  a  binary  search  to  locate  its 
breaker  (it  must  have  one — recall  that  the  blocks  were  first 
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Fig.  3.  Delimiting  a  pair  of  series  to  be  merged,  (a)  Locating  the  breakers,  (b)  Pair  of  resulting  series. 
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Fig.  4.  A  pair  of  series  and  the  corresponding  displacement  table  entries, 
(a)  Pair  of  series,  p  =  3.  (b)  Displacement  table  entries. 


sorted  by  their  tails)  and  broadcasts4  the  breaker’s  location 
first  to  its  left  and  then  to  its  right.  By  this  means,  a  processor 
learns  the  location  of  the  breaker  to  its  immediate  right  and 
the  location  of  the  breaker  to  its  immediate  left. 

From  this  it  follows  that  every  processor  can  correctly 
delimit  the  one  or  two  pairs  of  series  that  are  relevant  to 
the  contents  of  its  block  in  0(log  (n/k)  +  log  k)  time  and 
constant  extra  space  per  processor. 

Displacement  Computing:  Recall  that  our  goal  is  to  reor¬ 
ganize  the  file  so  that  local  merging  is  possible  as  a  final 
step.  This  requires  an  efficient  parallel  means  for  splitting 
each  pair  of  series  among  the  processors  that  are  in  charge 
of  the  pair’s  blocks.  In  order  to  accomplish  this,  we  shall  now 
introduce  what  we  term  a  displacement  table,  with  one  table 
entry  to  be  stored  at  each  processor.  In  this  table  we  seek  to 
enumerate,  for  each  processor  with  a  block  (or  portion  thereof) 
from  the  first  series,  the  number  of  records  from  the  second 
series  that  would  displace  records  in  that  block  if  there  were 
no  other  records  in  the  first  series.  Thus,  a  displacement  table 
is  of  immediate  use  in  the  next  step  (series  splitting),  because 
processor  i  needs  only  to  know  its  entry,  E^,  and  the  entry  for 
processor  i  —  1,  Ei-i.  From  these  two  values  it  is  easy  for 
processor  i  to  determine  the  number  of  its  records  that  are  to 
be  displaced  by  records  from  the  left  (namely,  Ei- 1)  and  the 
number  that  are  to  be  displaced  by  records  from  the  second 
series  (namely,  Ei  —  Ei- 1).  See  Fig.  4. 

As  with  the  block  sorting  step,  things  are  relatively  simple 
if  one  is  willing  to  settle  for  a  CREW  algorithm.  For  example, 
we  could  begin  by  directing  each  processor  whose  block 

4  A  convenient  algorithm  for  this  type  of  broadcasting  can  for  example 
be  found  in  [21,  p.  234],  where  it  is  termed  a  “data  distribution  algorithm.” 
Alternately,  such  broadcasting  can  be  efficiently  accomplished  with  parallel 
prefix  computation. 


contains  records  from  the  first  series  to  perform  a  binary  search 
on  the  second  series.  In  order  to  compute  the  displacement 
table  entries  efficiently  on  the  EREW  model,  we  adopt  a 
considerably  more  complicated  strategy.  In  particular,  we  must 
solve  a  nontrivial  processor  allocation  problem  [20].  We  agree 
with  the  sentiment  expressed  by  others  (see,  for  example,  [7], 
[16])  that  details  relevant  to  this  thorny  subject  warrant  a 
careful  exposition. 

For  an  arbitrary  pair  of  series,  let  /  denote  the  index  of 
the  processor  handling  the  first  record  in  the  first  series,  and 
let  p  denote  the  number  of  blocks  with  records  in  that  series. 
Thus,  processor  /  +  p  is  responsible  for  the  second  series.  We 
seek  to  direct  the  p  processors  with  records  in  the  first  series 
to  work  in  unison  and  without  memory  conflicts  to  determine 
where  each  of  their  block’s  tails  would  need  to  go  if  they  were 
merged  with  the  m  <  n/k  records  of  the  second  series.  To 
accomplish  this,  we  now  present  a  technique  that  is  perhaps 
best  described  as  a  sequence  of  phases  of  operations. 

In  the  first  phase,  each  processor  with  records  in  the  first 
series  sets  aside  a  copy  of  its  block’s  tail  and  its  index 
(an  integer  between  /  and  f  +  p  —  1,  inclusive).  Each  also 
sets  aside  two  pieces  of  information  from  the  second  series: 
processor  i  (/  <  i  <  f  +  p)  computes  and  saves  a  copy  of  the 
offset  h  =  (i  —  f  +  l)(m/p)  and  a  copy  of  the  ht h  record  of 
the  second  series.  We  can  now  merge  the  2 p  elements  made  up 
of  p  tails  and  p  selected  records  (dragging  along  the  indexes 
and  the  offsets)  by  reversing  in  parallel  the  selected  records 
and  then  invoking  a  bitonic  merge,  a  task  requiring  O(logp) 
time  and  O(p)  extra  space. 

After  this,  each  processor  with  records  in  the  first  series 
examines  the  two  keys  in  its  temporary  storage.  If  a  processor 
finds  a  tail,  then  (with  the  use  of  the  tail’s  index)  it  reports  its 
own  index  to  the  processor  handling  the  block  from  which  the 
tail  originated.  Thus,  every  processor  can  determine  from  the 
movement  of  its  block’s  tail  just  how  many  of  the  records 
selected  from  the  second  series  are  smaller,  and  therefore 
which  of  the  p  subseries  of  the  second  series,  each  subseries 
of  size  m/p,  to  merge  into  next.  In  order  for  a  processor  to  be 
able  to  determine  how  many  other  tails  are  to  be  merged  into 
the  same  next  subseries  as  its  block’s  tail,  each  one  compares 
its  next  subseries  to  that  of  its  neighbors.  If  the  comparison 
reveals  a  subseries  boundary,  then  broadcasting  is  used  to 
inform  the  other  processors  of  the  location  of  this  boundary 
(as  we  did  when  broadcasting  a  breaker’s  location  in  the  series 
delimiting  step). 

For  the  second  and  each  subsequent  phase,  processors 
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Fig.  5.  Series  splitting,  (a)  Notation,  (b)  Block  rotation,  (c)  Subblock  rotation,  (d)  Data  movement,  (e)  Subblock  rotation. 


proceed  as  in  the  first  phase,  but  now  with  new  offsets  and 
selected  records  based  on  the  proper  subseries  into  which  their 
block’s  tails  are  to  be  merged  and  the  number  of  other  tails 
that  are  also  to  be  merged  there.  Processors  continue  to  iterate 
this  procedure  until  each  has  determined  where  its  block’s 
tail  would  go  if  it  were  merged  with  the  other  tails  and  the 
second  series.  Note  that  some  processors  may  be  employed 
in  as  few  as  log£  m  phases,  each  requiring  0(log  k)  time, 
while  others  may  simultaneously  be  employed  in  as  many  as 
log2  m  phases,  each  requiring  constant  time.  In  general,  letting 
the  sequence  fci,  &2>  * "  >  denote  the  number  of  tails  in  any 
chain  of  recursive  calls,  we  observe  that  k\  x  k^  x  •  •  •  x  ki  is 
O(m),  and  hence  log  ki  +  log  k2  +  •  •  •  +  log  ki  is  0(log 
m).  Therefore,  0(log  n)  time  and  0(k)  extra  space  have  been 
consumed  up  to  this  point. 

Let  /j(l  <  k  <  m  +  p)  denote  the  location  that  the  tail  of 
the  block  of  processor  i  (/<«</  +  p)  would  occupy  in 
a  sublist  containing  the  p  tails  and  the  entire  second  series 
if  such  a  sublist  were  available.  Processor  %  now  computes 
l'i  —  h  —  (i-  f)  -  1,  to  eliminate  the  effect  of  its  block’s 
tail  and  all  preceding  tails.  It  next  employs  two  pointers  to 
compare  a  record  in  its  block,  beginning  at  location  n/k  (its 
tail),  to  a  record  in  the  second  series,  beginning  at  location 
repeatedly  decrementing  the  pointer  that  points  to  the  larger 
key  for  /'  iterations.  (We  insist  that  each  processor  works  from 
right  to  left  in  its  interval  of  the  second  series  in  order  to  avoid 
memory  conflicts,  and  that  processor  i  keeps  track  of  l'i_1 
and  *i+i.  relying  on  broadcasting  by  the  leftmost  processor 
if  degeneracy  in  an  interval  occurs.)  When  processor  i  has 


finished  decrementing  its  two  pointers  in  this  fashion,  a  task 
requiring  0(n/k)  time  and  0(k)  extra  space,  the  value  of  its 
second  series  pointer  is  its  displacement  table  entry,  £/. 

Thus,  displacement  computing  can  be  accomplished  in 
0(n/k  +  logrc)  time  and  constant  extra  space  per  processor. 

Series  Splitting:  At  this  point,  processor  i  can  easily  deter¬ 
mine  from  the  entries  in  the  displacement  table  the  number 
of  its  records  that  are  to  be  displaced  to  the  block  to  its 
right  (Ej),  as  well  as  the  number  of  records  that  it  is  to 
receive  from  the  block  to  its  left  (Ei- 1)  and  from  the  second 
series  (Ei  -  Ei- 1).  Thus,  we  now  seek  to  split,  in  parallel, 
the  second  series  among  the  blocks  of  the  first  series.  We 
accomplish  this  efficiently  in  constant  extra  space  with  the 
use  of  block  rotations  (each  of  which  is  effected  with  a 
sequence  of  three  sublist  reversals),  followed  by  the  desired 
data  movement,  followed  by  one  last  reversal.  We  illustrate  this 
procedure  in  Fig.  5,  with  the  aid  of  some  additional  notation. 

Letting  i  denote  the  index  of  an  arbitrary  processor  with 
records  in  the  first  series  only,  we  use  X{  to  denote  its  first 
n/k  —  Ei  records  (that  is,  those  to  remain  in  this  block)  and 
Yi  to  denote  the  remaining  Ei  records  (that  is,  those  to  be 
displaced  to  the  right).  We  use  Z  to  denote  the  contents  of  the 
portion  of  a  block  that  constitutes  the  second  series.  Processor 
i  first  reverses  X{  and  Yi  together,  then  each  separately, 
thereby  completing  the  rotation.  Processor  i  then  initiates  data 
movement,  employing  a  single  extra  storage  cell  to  copy  safely 
the  last  record  of  Yi  to  the  location  formerly  occupied  by 
the  last  record  of  1.  We  deviate  from  this  if  processor 
i  is  handling  the  last  block  of  the  first  series,  instructing  it 
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Fig.  6.  Local  merging,  (a)  Series  splitting  completed,  (b)  Local  merging. 


instead  to  copy  its  last  Y  record  to  the  former  location  of 
the  first  Z  record.  At  the  same  time,  the  processor  of  the 
second  series  copies  its  first  Z  record  to  the  former  location 
of  the  last  Y  record  of  the  first  (portion  of  a)  block  in  the  first 
series.  Continuing  in  this  fashion,  therefore,  the  data  movement 
sequence  is  right-to-left  for  the  blocks  in  the  first  series,  but 
left-to-right  for  the  second. 

Of  course,  when  block  i  of  the  first  series  is  filled,  the 
processor  of  the  second  block  must  shift  its  attention  to  block 
i  +  1,  and  so  on.  If  fc  is  small  enough  [no  greater  than  0(log 
n)],  then  the  displacement  table  can  simply  be  searched;  if 
k  is  larger  than  this,  then  the  table  may  contain  too  many 
identical  entries,  and  we  invoke  a  preprocessing  routine  to 
condense  it  (again  with  the  aid  of  broadcasting).  The  timing 
of  the  first  and  second  series  operations  are  interleaved  (rather 
than  simultaneous),  because  some  processors  will  in  general 
be  handling  portions  of  blocks  of  both  types  of  series. 

When  the  data  movement  phase  is  finished,  each  block 
will  contain  the  correct  prefix  from  the  opposite  series,  but 
in  reverse  order.  A  final  subblock  reversal  completes  this  step. 

Series  splitting,  therefore,  requires  0(n/k  +  logn)  time  and 
constant  extra  space  per  processor. 

Local  Merging:  We  employ  the  aforementioned  linear-time, 
in-place  sequential  merge  from  [10].  The  completion  of  this 
merge  is  depicted  in  Fig.  6. 

Thus,  this  final  step  requires  0(n/k)  time  and  constant  extra 
space  per  processor. 

Implementation  Details:  Although  the  details  necessary  to 
handle  lists  and  sublists  of  arbitrary  sizes  is  the  most  intricate 
part  of  the  sequential  method,  these  details  are  quite  simple 
for  our  parallel  algorithm.  We  first  fragment  the  input  list 
L  =  L1L2  into  the  form  L3L4L5L6,  where  both  L3  and 
L5  contain  an  integral  multiple  of  n/k  records,  and  where  LA 
and  L6  each  contain  strictly  less  than  n/k  (even,  possibly, 
zero)  records.  With  parallel  rotations,  it  is  easy  to  transform 
the  list  into  the  form  L3L5LAL6  assuming  the  tail  of  LA  is 
less  than  or  equal  to  the  tail  of  L6  (or  the  form  L3L5L6LA  if 
it  is  greater).  We  now  invoke  the  main  parallel  algorithm  on 
L3L5,  yielding  the  sorted  sublist  LI.  Ignoring  obvious  ways  to 
streamline  the  remainder  of  this  procedure,  it  is  sufficient  at 
this  point  merely  next  to  invoke  the  sequential  algorithm  on 
LAL6,  yielding  the  sorted  sublist  LS.  Thus,  LS  can  be  viewed 
as  at  most  one  block  of  size  n/k  followed  by  at  most  one 
block  of  size  strictly  less  than  n/k.  We  now  complete  the 
merge  by  invoking  the  main  parallel  algorithm  on  L7LS,  with 
every  processor  except  possibly  the  last  handling  a  block  of 
size  n/k.  Even  though  the  last  block  may  have  an  unusual 


size  at  this  step,  it  causes  no  problems  for  the  main  algorithm 
because  its  (large)  tail  ensures  that  it  need  not  be  moved  during 
block  sorting  and  because  its  (rightmost)  position  ensures  that 
it  need  not  be  treated  as  a  member  of  a  first  series  when  any 
pair  of  series  is  merged. 

The  time  and  space  requirements  necessary  for  implementa¬ 
tion  details  are  therefore  bounded  by  those  of  the  main  parallel 
algorithm. 

This  completes  the  description  of  our  parallel  method.  In 
summary,  the  total  time  spent  is  O (n/k  A-  logn)  and  the  total 
extra  space  used  is  0(k).  Therefore,  this  method  is  time-space 
optimal  for  any  value  of  k  <  n/(logn),  thereby  meeting  our 
stated  goal. 

IV.  Ensuring  Stability 

It  is  often  desirable  that  merging  (and  sorting)  algorithms 
be  stable,  by  which  we  mean  that  records  with  identical 
keys  retain  their  original  relative  order  after  the  algorithm 
is  completed.  Stability  is  a  property  that  has  extracted  a 
heavy  price  in  terms  of  increased  complexity  for  sequential 
algorithms  that  operate  in  both  optimal  time  and  space  simul¬ 
taneously.  The  linear-time,  in-place  stable  sequential  merging 
algorithm  with  the  lowest  currently-known  worst  case  constant 
of  proportionality  is  presented  in  [11]  and  is  based  largely 
on  the  unstable  method  that  proved  useful  in  guiding  our 
thinking  in  devising  the  parallel  algorithm  presented  in  the 
last  section.  As  one  rough  measure  of  the  intricacy  required 
to  ensure  stability  in  a  sequential  setting,  we  note  that  the 
worst  case  constant  of  proportionality  jumps  from  3.125n  (plus 
lower  order  terms)  for  the  unstable  algorithm  of  [10]  to  In 
(plus  lower  order  terms)  for  the  stable  scheme  of  [11],  where 
these  values  reflect  an  upper  bound  on  the  number  of  key 
comparisons  plus  record  exchanges  required. 

Fortunately,  however,  the  parallel  procedure  we  have  al¬ 
ready  presented  can  be  made  stable  with  relatively  little  effort. 
The  only  unstable  routines  in  our  main  algorithm  are  found 
in  the  steps  for  block  sorting,  displacement  computing,  and 
local  merging.  Instability  in  the  block  sorting  step  can  be 
remedied  by  stabilizing  the  bitonic  merge.  To  accomplish  this, 
we  need  only  specify  that  the  block  indexes  (which  are  already 
available)  are  to  be  used  as  “tie  breakers”  whenever  equal 
tails  are  compared.  The  displacement  computing  step  can  be 
modified  in  a  similar  manner,  by  first  stabilizing  the  bitonic 
merge  (indexes  and  offsets  are  already  available)  and  then 
handling  the  two  (now  asymmetric)  types  of  pairs  of  series  in 
slightly  different  fashions  in  that  LI  records  must  now  receive 
priority  over  L2  records.  The  local  merging  step  is  stabilized  by 
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replacing  the  relatively  simple  but  unstable  in-place  algorithm 
with  the  more  complicated  but  stable  in-place  scheme.  Only  an 
extra  pointer  is  needed  to  stabilize  the  implementation  details 
(in  the  event  that  the  L3  and  LA  sublists  each  have  a  copy  of 
the  same  key). 

V.  Extensions  to  Sorting  and  Open  Problems 

In  this  paper,  we  have  presented  for  the  first  time  parallel 
merging  algorithms  that  are  asymptotically  time -space  opti¬ 
mal.  Moreover,  our  methods  assume  only  the  EREW  PRAM 
model.  Although  n  must  be  large  enough  so  that  the  inequality 
k  <  n/(logn)  is  satisfied  for  optimality,  we  observe  that  our 
algorithms  are  efficient5  for  any  value  of  n,  suggesting  that 
they  may  have  practical  merit  even  for  relatively  small  inputs. 
Also,  for  the  sake  of  complete  generality,  our  algorithms 
modify  neither  the  key  nor  any  other  part  of  a  record. 

These  time -space  optimal  parallel  merging  algorithms  nat¬ 
urally  lead  to  time -space  optimal  parallel  sorting  algorithms, 
providing  improvements  over  the  best  previously-published 
PRAM  methods  designed  for  a  bounded  number  of  processors. 
For  example,  the  recent  EREW  merging  and  sorting  schemes 
proposed  in  [3]  (where  the  issue  of  duplicate  keys  is  not  even 
addressed)  are  time  optimal  only  for  values  of  k  <  nj  (log2  n) . 
More  importantly,  such  schemes  are  not  space  optimal  for  any 
fixed  k. 

We  note  that,  from  a  practical  standpoint,  more  streamlined 
sorting  implementations  may  be  possible.  It  is  known  from 
[11]  that  methods  exist  by  which  the  obvious  merge  sort  strat¬ 
egy  can  be  replaced  with  more  sophisticated  sorting  schemes 
that  exploit  merging  in  nontrivial  ways.  In  that  setting,  for 
example,  the  worst  case  constant  of  proportionality  of  the 
direct  merge  sort  strategy  is  lowered  from  In  log  n  (plus 
lower  order  terms)  to  2.5 n  log  n  (plus  lower  order  terms). 
Whether  these  more  complicated  techniques  can  be  efficiently 
parallelized  remains  an  open  question. 

Finally,  from  a  more  purely  theoretical  perspective,  one 
might  ask  whether  our  methods  can  be  extended  to  subloga- 
rithmic  time  merging.  Because  fi(log  n)  time  is  known  to  be  a 
lower  bound  for  merging  on  an  EREW  PRAM,  our  algorithms 
are  the  best  possible  (to  within  a  constant  factor)  for  this 
model.  Asymptotically  faster  time -space  optimal  algorithms 
may  exist,  however,  for  more  powerful  models.  For  example, 
it  is  an  open  question  whether  time-space  optimal  merging 
can  be  accomplished  in  0(n/k  +  log  log  n)  time  on  a  CREW 
PRAM. 
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Althoueh  polynomial-time  complexity  theory  has  been  formulated  in  terms  of  decision  problems, 
polynomial-time  decision  algorithms  generally  operate  by  attempting  to  construct  a  solution  to  an 
optimization  version  of  the  problem  at  hand.  Thus  it  is  that  self-reducibilit\\  the  process  by  which  a 
decision  algorithm  may  be  used  to  devise  a  constructive  algorithm,  has  until  now  been  widely 
considered  a  topic  of  only  theoretical  interest.  Recent  fundamental  advances  in  graph  theory,  however, 
have  made  available  powerful  new  nonconstructive  tools  that  can  be  applied  to  guarantee  membership 
in  P.  These  tools  are  nonconstructive  at  two  distinct  levels:  they  neither  produce  the  decision 
algorithm,  establishing  only  the  finiteness  of  an  obstruction  set,  nor  do  they  reveal  whether  such  a 
decision  algorithm  can  be  of  any  aid  in  the  construction  of  a  solution.  We  briefly  review  and  illustrate 
the  use  of  these  tools,  and  discuss  the  seemingly  formidable  task  of  finding  the  promised  polynomial- 
time  decision  algorithms  when  these  new  tools  apply.  Our  main  focus  is  on  the  design  of  efficient  self- 
reduction  strategies,  with  combinatorial  problems  drawn  from  a  variety  of  areas. 

KEY  WORDS:  Polynomial-time  complexity,  search  problems,  self-reducibility,  well-partial-order 
theory. 

C.R.  CATEGORIES:  F.2.2.  [Analysis  of  algorithms];  G.2.2.  [Discrete  mathematics] 


1.  INTRODUCTION 

A  central  concern  of  concrete  complexity  theory  has  been  establishing  the 
boundaries  of  P ,  that  is,  determining  exactly  those  problems  that  are  decidable  in 
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(deterministic)  polynomial  time.  Decision  problems  have  traditionally  been  shown 
to  be  in  P  by  producing  efficient  algorithms  that  actually  attempt  to  solve  “search 
problems”  [13],  constructing  solutions  to  optimization  versions  of  the  problems 
whenever  such  solutions  exist.  Thus,  while  the  distinction  between  decision  and 
construction  has  remained  well-defined  in  the  study  of  NP-completeness,  this  same 
distinction  has  until  now  been  rather  blurred  for  problems  in  P . 

This  situation  is  suddenly  and  dramatically  altered,  however,  as  a  consequence 
of  powerful,  easy-to-apply  graph  theory  tools  recently  made  available  by  the 
seminal  work  of  Robertson  and  Seymour.  See,  for  example,  [27,  28,  29,  30].  When 
these  tools  can  be  employed,  they  classify  problems  as  decidable  in  polynomial 
time  by  proving  merely  the  existence  of  polynomial-time  decision  algorithms  [6,  7, 
8].  More  importantly,  for  the  subject  of  this  paper,  there  is  no  guarantee  that  an 
optimal  solution  can  in  fact  be  constructed  in  polynomial  time,  even  if  an  efficient 
decision  algorithm  is  found.  These  developments,  therefore,  call  in  to  question  the 
previous  folk  wisdom  that  only  the  complexity  of  decision  problems  need  be 
addressed  [21].  Moreover,  they  motivate  a  serious  study  of  efficient  strategies  for 
employing  decision  algorithms  to  construct  solutions  to  concrete  problems.  This 
process,  termed  self-reducibility,  has  until  now  been  primarily  a  subject  of 
theoretical  interest  in  investigating  complexity  classes  [19,  24,  31]. 

In  the  next  section,  we  briefly  review  the  background  material  necessary  to  state 
the  relevant  results  from'  graph  theory.  In  Section  3,  we  demonstrate  their  use  by 
means  of  a  simple  example.  In  Section  4,  we  establish  the  self-reducibility  of  four 
illustrative  problems,  in  two  cases  improving  on  the  best  previously-known 
bounds,  in  two  cases  proving  polynomial-time  constructivity  for  the  first  time. 
Section  5  consists  of  a  collection  of  general  remarks  pertinent  to  future  research  on 
this  topic.  We  also  mention  some  other  problems  amenable  to  this  nonconstructive 
approach  that  we  have  not  been  able  to  self-reduce  and  discuss  possible 
explanations  for  this  difficulty. 


2.  THEORETICAL  FOUNDATIONS 

In  this  section,  we  first  give  some  basic  definitions  in  order  to  enable  us  to  state  a 
fundamental  theorem  due  to  Robertson  and  Seymour.  All  graphs  we  consider  are 
finite  and  undirected,  and  may  have  loops  or  multiple  edges. 

Given  a  graph  G  =  (F,£),  a  graph  H  is  a  minor  of  G,  denoted  HgG,  if  H  can  be 
obtained  from  a  subgraph  of  G  by  contracting  edges.  For  example,  the  graph  of 
the  wheel  with  four  spokes  is  a  minor  of  the  graph  of  the  three-dimensional  binary 
cube,  as  can  be  seen  by  the  construction  depicted  in  Figure  1  (other  constructions 
suffice  as  well  for  this  example). 

Note  that  the  relation  ^  defines  a  partial  ordering  on  finite  graphs.  A  family  F 
of  finite  graphs  is  said  to  be  closed  under  minors  if,  whenever  G  is  in  F,  every 
minor  of  G  must  also  be  in  F.  Robertson  and  Seymour  [29]  have  shown  that,  for 
every  fixed  graph  H,  the  problem  that  takes  as  input  a  graph  G  and  determines 
whether  H^G  is  solvable  in  polynomial  time. 

Kuratowski’s  Theorem  [20]  can  be  restated  as  follows:  a  graph  G  is  planar  if 
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Figure  1  Construction  demonstrating  that  WC  is  a  minor  of  Q3. 


H  =  W 


and  only  if  neither  KSZG  nor  K2.^G.  The  obstruction  set  ^r  a  mmor-closed 
family  F  of  finite  graphs  is  the  set  of  graphs  in  the  complemen  of  F  that  are 
minimal  in  the  minor  ordering.  Answering  in  the  affirmative  a  long-unresolved 
conjecture  due  to  Wagner  [32],  Robertson  and  Seymour  have  shown  [30]  that 
any  set  of  finite  graphs  contains  only  &  finite  number  of  minor-minimal  dements. 
(In  other  words,  graphs  are  well-partially-ordered  by  minors.)  This  result  is 
inherently  nonconstructive  in  the  following  sense  [10]:  there  can  be  no  systematic 
method  for  computing  the  finite  obstruction  set  for  an  arbitrary  minor-closed 
family  F  from  the  description  of  a  Turing  machine  that  accepts  precisely  the 
graphs  in  F.  Therefore,  although  we  are  assured  of  a  finite  obstruction  set,  t 
proof  of  the  theorem  gives  no  information  about  how  to  find  the  set  how 
determine  its  cardinality,  or  even  how  to  bound  the  order  of  the  largest  graph 

contains-ur  -n  this  paper;  we  summarize  the  above  results  and  state  the 

following,  denoting  it  as  the  RS  (Robertson-Seymour)  Theorem. 

RS  Theorem  Any  family  of  finite  graphs  that  is  closed  under  minors -.  can  be 
recognized  in  polynomial  time— specifically,  in  0(\V\  )  time  Moreover,  if  the  f  y 
excludes  a  planar  graph,  then  membership  can  be  decided  in  0(|F|  )  time. 


3.  A  SIMPLE  APPLICATION  OF  THE  RS  THEOREM 

To  illustrate  the  use  of  the  RS  Theorem,  we  consider  the  following  f.^d-parameter 

variant  of  the  NP-complete  longest  path  problem  [13].  In  this  k  longest  path  (  ) 

problem,  we  are  given  a  graph  G  and  are  asked  whether  G  C“S  * 

with  k  or  more  edges.  Obviously,  this  variant  can  be  solved  by  brute  force  in 

polynomial  time.  We  simply  check  all  (!!%)(*+  D'-Vossiblt  paths ^of 

directly,  we  now  show  that,  as  an  immediate  corollary  to  the  RS  Theorem,  ther 

must  exist  a  low-order  polynomial-time  algorithm. 

Theorem  2.1  The  kLP  problem  can  be  decided  in  0(|F|2)  time. 

.  Proof  Observe  that  the  family  of  “yes”  instances  for  kLP  is  not  closed  under 
minors,  because  taking  a  subgraph  or  contracting 

so  can  eliminate  long  paths.  Fortunately,  however,  the  family  of  no  instances 
dosed  under  minors.  That  is,  if  G  has  no  simple  path  of  length  greater  than  or 
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equal  to  k,  then  taking  a  subgraph  or  contracting  edges  could  never  create  one. 
Thus  the  RS  Theorem  applies.  Moreover,  the  obstruction  set  for  this  simple 
problem  has  only  one  element,  namely,  the  path  of  length  k.  This  is  certainly 
planar,  so  kL?  can  be  decided  in  0(\V\2)  time.  □ 

Of  course,  all  we  have  argued  is  the  existence  of  a  quadratic-time  algorithm  for 
the  decision  version  of  /cLP.  We  seek  to  know  whether  Theorem  2.1  can  be  used  to 
tell  us  anything  about  how  long  it  takes  to  construct  a  path  of  length  k  or  more  if 
any  exist.  As  we  shall  show  in  the  next  section,  it  turns  out  that  kLP  is 
self-reducible. 


4.  POLYNOMIAL-TIME  SELF-REDUCIBILITY 

Theorem  3.1  A  solution  to  the  kL P  problem  can  be  constructed ,  if  any  exist ,  in 
0(|K|2  log | K()  time. 

Proof  For  the  purpose  of  illustration,  we  first  describe  a  simple  method  to 
construct  a  solution  in  0(|K|4)  time.  In  general,  we  treat  the  decision  algorithm 
much  like  an  oracle.  Suppose  we  are  given  a  graph  G  for  which  the  decision 
algorithm  tells  us  that  there  is  a  path  of  length  at  least  k .  We  delete  some  edge,  e , 
(and  any  multiple  copies  of  e)  and  again  ask  the  decision  question,  this  time  on 
G-{e}.  If  the  answer  is  “yes”,  then  (all  copies  of)  e  can  be  discarded  and  the 
remaining  graph  still  has  a  path  of  length  at  least  k.  On  the  other  hand,  if  the 
answer  is  “no”,  then  that  edge  is  needed  to  build  a  path  of  length  k .  Therefore,  by 
calling  the  decision  algorithm  at  most  |£|~1  times,  we  have  located  an  appropri¬ 
ate  path.  (Notice  that  the  order  in  which  edges  are  considered  may  affect  which 
path  is  discovered  when  G  contains  multiple  solutions.) 

In  order  to  self-reduce  more  efficiently,  we  arbitrarily  partition  V  into  k  +  2 
classes  Vh  1  g/^/c  +  2,  of  cardinalities  as  equal  as  possible.  (That  is,  |] “ |  vj\\  S 1 
for  lgi,  j^k  +  2.)  Since  G  contains  a  path  of  length  k  on  k+  1  vertices,  so  does 
G-Vs  for  some  s,  l^s^/c  +  2.  Such  an  s  can  be  determined  by  deciding  at  most 
k  +  2  instances  of  /cLP.  By  repeating  this  procedure,  next  on  G—  V we  can  reduce 
G  to  a  subgraph  G'  of  order  /c+  1  containing  a  path  of  length  k  in  0(log|K|)  steps. 
At  this  point,  there  is  only  a  constant  amount  of  work  left  to  do  to  identify  an 
appropriate  path.  Since  each  call  to  the  decision  algorithm  requires  0(\V\2)  time, 
the  time  bound  in  the  statement  of  the  theorem  is  assured.  □ 

For  sufficiently  dense  graphs,  this  improves  on  the  lowest  previously-published 
upper  bound  (both  for  decision  and  construction)  of  0(|K||£|)  for  kL?  [23]. 

Although  things  have  gone  nicely  for  this  simple  example,  we  remind  the  reader 
that  the  RS  Theorem  is  nonconstructive.  No  general  method  for  isolating 
obstruction  sets  is  known.  Moreover,  even  if  an  obstruction  set  were  available, 
there  is  absolutely  no  guarantee  that  computing  an  optimal  solution  can  be 
accomplished  within  any  given  time-complexity  class.  Therefore,  we  must  be  aware 
of  the  (counter-intuitive)  possibility  that  there  may  exist  natural  problems  whose 
decision  versions  never  need  more  than  low-order  polynomial  time  while  their 
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seemingly  closely-related  optimization  versions  require,  in  the  worst-case  at  least, 
exponential  or  worse  time! 

Consider  now  the  minimum  cut  linear  arrangement  problem  [13],  which  is  NP- 
complete  even  when  input  is  restricted  to  planar  graphs  of  maximum  degree  three 
[25],  but  can  be  solved  in  0(|K|log|K|)  time  for  trees  [33],  In  the  relevant  fixed- 
parameter  variant  of  this  problem  (/cMCLA),  we  are  given  a  graph  G  and  are 
asked  whether  there  is  an  arrangement  of  the  vertices  of  G  along  a  horizontal  line 
so  that  any  vertical  line  placed  between  consecutive  vertices  cuts  at  most  k  edges 

connecting  vertices  on  opposite  sides  of  the  vertical  line. 

In  [22],  it  has  been  demonstrated  that  /cMCLA  is  solvable  in  0{\V\  )  time  by 

dynamic  programming.  In  [8],  it  has  been  shown  that  the  RS  Theorem  can  be 
applied  to  guarantee  that  /cMCLA  is  decidable  in  0(|K|4)  time.  This  result  relies 
on  a  cost-preserving  transformation  [25]  that  reduces  an  arbitrary  graph  of  order 
|V|  to  an  equivalent  graph  of  maximum  degree  three  with  order  0(|  F|2).  It  remains 
only  to  show  [8]  that  the  family  of  graphs  of  maximum  degree  three  that  are  “yes” 
instances  of  ZcMCLA  is  closed  under  minors.  Hence  we  face  the  issue  of 
self-reducibility. 

Theorem  3.2  A  solution  to  /cMCLA  can  be  constructed,  if  any  exist,  in  0(|K|6) 
time. 

Proof  Assume  that  G  is  connected  and  that  /cMCLA  (G)=  “yes”.  Obviously,  no 
two  vertices  have  more  than  k  edges  between  them. 

We  first  find  a  vertex  that  can  serve  as  a  starting  point  (leftmost  or  rightmost 
vertex)  in  a  satisfactory  arrangement.  To  do  this,  we  choose  an  arbitrary  vertex  y 
of  V,  add  a  new  vertex  x,  and  augment  E  with  k  edges  between  x  and  y,  thereby 
obtaining  a  new  graph  G'.  Clearly,  for  some  choice  of  y,  it  must  be  that 
/:MCLA^G')=“yes”,  implying  that  all  permissible  arrangements  of  G'  begin  with  x, 
then  y  (the  connectedness  of  G  rules  out  all  other  possibilities).  Finding  such  a 
vertex  y  requires  at  most  0(|F|5)  time. 

We  now  build  on  G'  to  find  a  second  vertex.  To  accomplish  this,  we  choose  an 
arbitrary  vertex  z  of  V  —  {x,  y),  augment  E  as  necessary  so  that  there  are  k  edges 
between  y  and  z,  and  replace  each  edge  of  the  form  (}’,o),  a${x,z},  with  (z,a), 
thereby  obtaining  a  new  graph  G".  It  follows  that,  for  some  choice  of  z,  it  must  be 
that  /cMCLA  (G")=“yes’\  implying  that  all  permissible  arrangements  for  G"  begin 
with  x,  then  y,  then  z.  Conversely,  a  permissible  arrangement  of  the  vertices  of  G" 
is  a  permissible  arrangement  for  G'  as  well.  Finding  such  a  vertex  z  requires 

0(1  Kl5)  time.  .  .  „ 

We  repeat  this  construction,  each  time  modifying  the  graph  so  as  to  freeze  a 
prefix  of  some  satisfactory  arrangement,  producing  a  solution  in  0(|K|)  )  time.  □ 

We  now  move  on  to  an  important  NP-complete  non-graph  problem  [26]  that 
has  been  the  subject  of  much  study  in  the  VLSI  community.  This  fundamental, 
combinatorial  problem  lies  at  the  heart  of  a  number  of  circuit  layout  styles,  and 
has  accordingly  acquired  several  names,  most  notably  gate  matrix  layout  and 
multiple  PLA  folding.  (We  adopt  the  former.)  An  instance  of  the  gate  matrix  layout 
problem  consists  of  a  set  of  n  nets  (rows)  and  their  respective  connections  to  a  set 
of  m  gates  (columns).  The  goal  is  to  find  a  permutation  of  the  gates  that  permits 
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the  circuit  to  be  laid  out  in  k  or  fewer  tracks.  More  formally,  our  fixed-parameter 
variant  can  be  described  as  follows. 

k  gate  matrix  layout  (kGML) 

Instance:  An  nxm  Boolean  matrix  M. 

Question:  Can  the  columns  of  M  be  permuted  in  such  a  way  that,  if  in  each  row 
we  change  to  1  every  0  lying  between  the  row’s  leftmost  and  rightmost 
l’s,  then  no  column  contains  more  than  k  Vs ? 

The  kGML  problem  appears  exceedingly  difficult.  For  any  k'Z 2,  there  are 
instances  of /cGML  with  only  two  satisfactory  permutations  and  ml- 2  unsatisfac¬ 
tory  ones  [5],  precluding  any  polynomial-time  brute-force  attack  that  focuses  on 
a  predetermined  set  of  column  permutations.  Surprisingly,  however,  it  has  been 
shown  in  [6]  that  the  matrix  M  can  be  modeled  as  a  graph  G(M)  on  n  vertices  so 
that  the  family  of  graphs  corresponding  to  “yes”  instances  is  closed  under  minors 
for  every  fixed  k.  Moreover,  planar  obstructions  exist  for  each  k  [6].  Therefore,  we 
know  that  the  RS  Theorem  is  applicable  and  kGML  is  decidable  in  0(n2)  time. 
(Incidentally,  for  k  =  2,  it  is  known  that  there  are  only  two  simple  obstructions. 
For  k  =  3,  the  size  of  the  obstruction  set  has  been  bounded  and  its  elements,  now 
numbering  at  least  110,  are  being  enumerated  [2].  Determining  obstruction  sets 
for  k^4  is,  as  of  this  writing,  an  open  problem.) 

Recall,  however,  that  our  goal  is  to  find  a  satisfactory  permutation  of  the 
columns  of  Af,  if  any  exist.  (Given  such  a  permutation,  it  is  easy  to  complete  the 
track  assignments  of  the  layout  with  a  simple  greedy  rule  [16,  26].)  Inspired  by  the 
existence  of  the  aforementioned  decision  algorithm,  we  now  show  that  kGML  is 
self-reducible. 

Theorem  3.3  A  solution  to  kGML  can  be  constructed ,  if  any  exist ,  in  0(n3m )  time. 

Proof  Suppose  kGML(M)  =  “yes”.  We  then  attempt  to  modify  M  by  executing 
a  subprogram  described  in  pidgin  Algol  as  follows: 

for  /«- 1  to  n  do 

for  j<- 1  to  m  do 

if  Af(/,y)  = 0  then  begin 

M(iJ)+-l 

if  kGML(M)  =  “no”  then  0 

end  begin 
end  do 
end  do. 

Therefore,  after  processing  all  of  M,  /cGML(M)=“yes”  still  holds;  no  remaining  0 
can  be  changed  to  a  1.  The  resultant  matrix  must  now  enjoy  the  “consecutive  Is 
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property”.  That  is,  M  now  possesses  a  column  permutation  such  that,  for  every 
row,  the  leftmost  and  rightmost  Is  are  not  separated  by  even  a  single  0.  We 
employ  the  PQ-Tree  algorithm  as  described  in  [3,  14]  to  identify  such  a 
permutation  in  0(nm )  time.  (Notice  that  the  solution  so  obtained  may  be 
dependent  on  the  order  in  which  the  elements  of  M  were  considered.)  □ 

We  conclude  this  section  with  a  generic  form  of  fixed-parameter  problem.  Let  F 
denote  an  arbitrary  minor-closed  family  of  graphs. 

within  k  vertices  of  F  ( wkvF ) 

Instance:  A  graph  G. 

Question:  Does  G  contain  a  set  of  k  or  fewer  vertices  that,  when  deleted,  leave  a 
graph  in  FI 

As  examples,  k-vertex  cover  and  k-feedback  vertex  set  are  representatives  of 
wkvF  when  F  is  the  family  of  edgeless  graphs  and  the  family  of  acyclic  graphs, 
respectively.  Also,  when  F  is  the  family  of  planar  graphs,  the  notion  of  “plananz- 

ina  sets”  has  been  addressed  in  [12,  15],  . 

For  every  fixed  valued  of  k  and  every  minor-closed  family,  F,  this  problem  is 
trivially  decidable  in  polynomial  time  by  brute  force.  One  need  only  check  the  (  * ) 
graphs  that  result  from  the  removal  of  k  vertices?  This  can  be  done  in  0(|F|  p(|F|)) 
time,  where  p(|K|)  bounds  the  time  required  to  test  for  membership  in  F.  However, 
it  has  been  shown  in  [7]  that  wkvF  can  in  fact  be  decided  in  only  0(|K|  )  time. 
We  shall  now  prove  that  this  permits  a  low-order  polynomial-time  self-reduction 
strategy  as  well,  enabling  the  construction  of  a  solution  in  less  than  the 
0(|K|‘p(|F|))  time  bound  given  by  brute  force. 

Theorem  3.4  A  solution  to  wkvF  can  be  constructed,  if  any  exist,  in  G(|K|  )  time. 

Proof  Let  H  denote  some  fixed  member  of  the  (finite)  obstruction  set  for  F, 
and  suppose  wkvF(G)=“ yes”.  We  now  identify  an  arbitrary  vertex,  .x,  of  G  with  an 
arbitrary  vertex,  y,  of  H.  That  is,  we  construct  G'  =  <Kg-,£g->  where 

Va. = ( VG  -  {x})  u  ( VH  -  {y})  u  {z} 

and 

Eg.=(E0 -{edges  incident  to  x})  vj  (E„ -  {edges  incident  to  y}) 


v  {e = (a,  z)|(a,  x)  e  Ea  or  (a,  y)  6  £„}. 

(Multiple  copies  of  edges  can  be  discarded.)  Since  H  is  a  minor-minimal  element  of 
F  the  removal  of  exactly  one  vertex  from  our  copy  of  H  in  G  is  now  necessary;  it 
may  as  well  be  z.  It  follows  that  we  can  use  x  in  a  solution  for  G  if  and  only  if 
wkv(G')  =  “yes".  Since  H  is  fixed,  |KC.|  is  0(|K|)  and  we  can  decide  wkvF(G )  in  at 

most  0(1  Kl3)  time.  .  t  . 

After  considering  x,  we  move  on  to  employ  this  construction  with  another  copy 
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of  H  at  yet  another  vertex  of  G,  building  on  G'(G)  if  w/a;(G')  =  “yes”  (“no”).  By  the 
time  we  have  considered  each  vertex  in  G,  we  must  have  isolated  a  k-vertex 
solution,  satisfying  the  time  bound  in  the  statement  of  the  theorem.  □ 


5.  CONCLUDING  REMARKS 

The  nonconstructive  methods  that  have  motivated  our  investigations  in  this  paper 
raise  serious  questions  about  the  relationship  between  algorithms  that  decide  and 
algorithms  that  construct  or  optimize.  This  relationship  has  in  the  past  often  been 
conveniently  (and  mistakenly)  taken  for  granted. 

An  interesting  aspect  of  these  tools  based  on  well-partially-ordered  sets  is  their 
power  to  demonstrate  polynomial-time  complexity  bounds  for  some  problems  for 
which  there  is  no  known  alternate  proof  available  for  membership  in  either  NP  or 
co-NP,  and  in  some  cases,  for  which  no  alternate  proof  is  known  even  for 
decidability!  It  is  just  such  problems  that  appear  at  present  to  resist  the  kinds  of 
self-reduction  strategies  that  we  have  illustrated  herein. 

For  example,  consider  the  problem  of  finding,  when  any  exist,  an  embedding  of 
a  graph  G  in  3-space  so  that  no  cycle  of  G  is  nontrivially  knotted  [4,  7].  The 
family  of  graphs  that  have  such  knotless  embeddings  is  closed  under  minors,  and 
therefore  the  problem  of  deciding  whether  a  graph  has  any  such  embedding  can  be 
solved  in  0(|K|3)  time.  No  alternate  proof  of  decidability  is  known;  the  difficulties 
of  establishing  such  a  proof  appear  to  be  significant  [17,  18].  Similarly,  no  self¬ 
reduction  algorithm  is  known;  the  problem  of  devising  one  appears  to  be 
formidable  indeed. 

The  impact  of  these  nonconstuctive  methods  on  the  fields  of  complexity  theory 
and  algorithm  design  remains  to  be  determined.  From  a  theoretical  standpoint, 
they  inspire  a  study  of  constructive  complexity  [l]  and  a  formal  notion  of  fast  self- 
reducibility  [9].  Also,  new  techniques  for  identifying  polynomial-time  algorithms 
without  the  need  for  entire  obstruction  sets  have  begun  to  appear  [10].  If  useful 
decision  algorithms  are  possible  for  some  problems  amenable  to  this  general 
approach,  then  the  existence  and  efficiency  of  self-reduction  algorithms  may 
become  an  increasingly  practical  issue  as  well. 
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A  final  technical  report  is  enclosed. 
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